I would like to understand the steps that PyMC3 takes when building a model.
I attempt to outline what it does under the hood using numpy and scipy. Note that I’m not sure whether understanding the specific sampling techniques is important (i.e. Hamilton, NUTS, etc). I think Hamilton is easiest as an illustration of what happens, but I’m not sure that it works with multiple parameters.
Please refer here for the model in pm:
import numpy as np import pymc3 as pm from scipy import stats data = np.array([1.72, 1.58, 1.64, 1.58]) with pm.Model() as model1: alpha = pm.AsymmetricLaplace('alpha', b = 22.14, kappa = 1.54, mu = 889.14) scale_raw = pm.Beta('scale', alpha = 0.75, beta = 1.51) scale_transf = pm.Deterministic("scale_transf", scale_raw * scale_best_fit['beta']['scale'] + scale_best_fit['beta']['loc']) pred = pm.Gamma("Annual_ins", alpha = alpha, beta = 1/scale_transf, observed = data) trace = pm.sample(2000, tune = 8000)
So, ultimately I am trying to get a posterior distribution of data. I have 2 parameters in the Gamma distribution that I am estimating. Let’s just ignore the complication with the pm.Deterministic and assume that I already have the alpha and betas that I need as priors. So here is my attempt in quasi-pseudocode:
# initialize with random values alpha_current = ? beta_current = ? alpha_array = np.zeros(1000) beta_array = np.zeros(1000) for i in range(1000): # not sure whether this is correct, since the way I have it below seems actually like # a likelihood alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_current) beta_prior = stats.beta.pdf(*beta_params, beta_current) # Calculate likelihood likelihood_current = stats.gamma.pdf(alpha = alpha_current, beta = beta_current, data) post_current = np,prod(likelihood_current * alpha_prior * beta_prior) alpha_proposed = np.random.normal(alpha_current, scale=0.5) beta_proposed = np.random.normal(beta_current, scale=0.5) # get priors - *params is prior parameters I specify alpha_prior = stats.laplace_asymmetric.pdf(*alpha_params, alpha_proposed) beta_prior = stats.beta.pdf(*beta_params, beta_proposed) # Calculate likelihood likelihood_proposed = stats.gamma.pdf(alpha = alpha_proposed, beta = beta_proposed, observed = data) # Create posterior post_proposed = np.prod(likelihood_proposed * alpha_prior * beta_prior) # Compute the probability of move ratio = post_proposed / post_current p_move = min(ratio, 1) random_draw = np.random.uniform(0,1) if (random_draw < p_move): alpha_current = alpha_proposed beta_current = beta_proposed # Store the current value alpha_array[i] = alpha_current beta_array[i] = beta_current
Does it matter that I have 4 values in the likelihood but only 1 value each for the alpha and beta? Is alpha and beta what I’m actually updating every time as I’ve specified?