Example of computing posterior from prior and likelihood under the hood

I would like to understand the steps that PyMC3 takes when building a model.

I attempt to outline what it does under the hood using numpy and scipy. Note that I’m not sure whether understanding the specific sampling techniques is important (i.e. Hamilton, NUTS, etc). I think Hamilton is easiest as an illustration of what happens, but I’m not sure that it works with multiple parameters.

Please refer here for the model in pm:

import numpy as np
import pymc3 as pm
from scipy import stats 

data = np.array([1.72, 1.58, 1.64, 1.58])

with pm.Model() as model1:
    alpha = pm.AsymmetricLaplace('alpha', b = 22.14, 
                                    kappa = 1.54, 
                                    mu = 889.14)

    scale_raw = pm.Beta('scale', alpha = 0.75, 
                                beta = 1.51)
    scale_transf = pm.Deterministic("scale_transf", scale_raw * scale_best_fit['beta']['scale'] + scale_best_fit['beta']['loc'])

    pred = pm.Gamma("Annual_ins", alpha = alpha, beta = 1/scale_transf, observed = data)

    trace = pm.sample(2000, tune = 8000)

So, ultimately I am trying to get a posterior distribution of data. I have 2 parameters in the Gamma distribution that I am estimating. Let’s just ignore the complication with the pm.Deterministic and assume that I already have the alpha and betas that I need as priors. So here is my attempt in quasi-pseudocode:

# initialize with random values
alpha_current = ?
beta_current = ?

alpha_array = np.zeros(1000)
beta_array = np.zeros(1000)

for i in range(1000):

  # not sure whether this is correct, since the way I have it below seems actually like 
  # a likelihood
  alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_current)
  beta_prior = stats.beta.pdf(*beta_params, beta_current)
  
  # Calculate likelihood
  likelihood_current = stats.gamma.pdf(alpha = alpha_current, beta = beta_current, data)

  post_current = np,prod(likelihood_current * alpha_prior * beta_prior)

  alpha_proposed = np.random.normal(alpha_current, scale=0.5)
  beta_proposed = np.random.normal(beta_current, scale=0.5)
  
  # get priors - *params is prior parameters I specify
  alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_proposed)
  beta_prior = stats.beta.pdf(*beta_params, beta_proposed)

  # Calculate likelihood
  likelihood_proposed = stats.gamma.pdf(alpha = alpha_proposed, beta = beta_proposed, observed = data)
  
  # Create posterior
  post_proposed = np.prod(likelihood_proposed * alpha_prior * beta_prior)
  
  # Compute the probability of move
  ratio = post_proposed / post_current
  p_move = min(ratio, 1)
  random_draw = np.random.uniform(0,1)
  if (random_draw < p_move):
        alpha_current = alpha_proposed
        beta_current = beta_proposed
        
   # Store the current value
   alpha_array[i] = alpha_current
   beta_array[i] = beta_current

Does it matter that I have 4 values in the likelihood but only 1 value each for the alpha and beta? Is alpha and beta what I’m actually updating every time as I’ve specified?

1 Like

It looks approximately correct to me. You can take a look at this notebook for a very simple MCMC (Metropolis) algorithm and compare.

1 Like

Thank you! So in the example notebook, there is just one parameter (theta), so we mulitiply the likelihood of theta by the prior of theta to get the posterior. In my case, I’m multiplying the likelihood (which is gamma) by the priors (which are alpha and beta). This is one of the parts that I wanted to make sure that I wasn’t mixing up.

1 Like

Yup. More conventionally, you would calculate the log prior probability of each parameter and the log likelihood and then sum everything up to get the log posterior probability. So you can just keep a running sum, adding in the likelihood of each observation, the sum the log prior of any parameter values that have been proposed, etc. You can even have multiple “pieces” to your likelihood (e.g., the likelihood of multiple data sets, etc.).