I would like to understand the steps that PyMC3 takes when building a model.
I attempt to outline what it does under the hood using numpy and scipy. Note that I’m not sure whether understanding the specific sampling techniques is important (i.e. Hamilton, NUTS, etc). I think Hamilton is easiest as an illustration of what happens, but I’m not sure that it works with multiple parameters.
Please refer here for the model in pm:
import numpy as np
import pymc3 as pm
from scipy import stats
data = np.array([1.72, 1.58, 1.64, 1.58])
with pm.Model() as model1:
alpha = pm.AsymmetricLaplace('alpha', b = 22.14,
kappa = 1.54,
mu = 889.14)
scale_raw = pm.Beta('scale', alpha = 0.75,
beta = 1.51)
scale_transf = pm.Deterministic("scale_transf", scale_raw * scale_best_fit['beta']['scale'] + scale_best_fit['beta']['loc'])
pred = pm.Gamma("Annual_ins", alpha = alpha, beta = 1/scale_transf, observed = data)
trace = pm.sample(2000, tune = 8000)
So, ultimately I am trying to get a posterior distribution of data. I have 2 parameters in the Gamma distribution that I am estimating. Let’s just ignore the complication with the pm.Deterministic and assume that I already have the alpha and betas that I need as priors. So here is my attempt in quasi-pseudocode:
# initialize with random values
alpha_current = ?
beta_current = ?
alpha_array = np.zeros(1000)
beta_array = np.zeros(1000)
for i in range(1000):
# not sure whether this is correct, since the way I have it below seems actually like
# a likelihood
alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_current)
beta_prior = stats.beta.pdf(*beta_params, beta_current)
# Calculate likelihood
likelihood_current = stats.gamma.pdf(alpha = alpha_current, beta = beta_current, data)
post_current = np,prod(likelihood_current * alpha_prior * beta_prior)
alpha_proposed = np.random.normal(alpha_current, scale=0.5)
beta_proposed = np.random.normal(beta_current, scale=0.5)
# get priors - *params is prior parameters I specify
alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_proposed)
beta_prior = stats.beta.pdf(*beta_params, beta_proposed)
# Calculate likelihood
likelihood_proposed = stats.gamma.pdf(alpha = alpha_proposed, beta = beta_proposed, observed = data)
# Create posterior
post_proposed = np.prod(likelihood_proposed * alpha_prior * beta_prior)
# Compute the probability of move
ratio = post_proposed / post_current
p_move = min(ratio, 1)
random_draw = np.random.uniform(0,1)
if (random_draw < p_move):
alpha_current = alpha_proposed
beta_current = beta_proposed
# Store the current value
alpha_array[i] = alpha_current
beta_array[i] = beta_current
Does it matter that I have 4 values in the likelihood but only 1 value each for the alpha and beta? Is alpha and beta what I’m actually updating every time as I’ve specified?