How is marginalization performed by sampling in pymc?

Hello everyone,

I’m trying to understand how pyMC implements a multilevel model, and I’d appreciate your help to ensure I’m not building on an incomplete understanding of the implementation.

I’ve taken a simpler example than the Gelman and Hill example. I have chosen y=ax, where a is a first-level parameter, and \mu_a and s_a are second-level parameters. The expression for posterior is as follows:

I am trying to get back the true values of the parameters as a check for my implementation. Also I do not want to use a noise variable for the likelihood, hence gave it a constant value.
The minimal code for the same is:

import numpy as np 
import pymc as pm
import matplotlib.pyplot as plt 

rng = np.random.default_rng(0)
np.random.seed(0)

mu_a_true = 10
s_a_true = 2
n_data_points = 1000

xdata = np.linspace(0,1,n_data_points)
a_true = np.random.normal(mu_a_true, s_a_true, n_data_points)
ydata = a_true * xdata 

with pm.Model() as model:  
    xdata = pm.Data('xdata', xdata, mutable=True)

    mu_a = pm.Uniform('mu_a', lower = 0, upper =100)
    s_a = pm.Uniform('s_a', lower = 0, upper = 100) 
    a = pm.TruncatedNormal('a', mu=mu_a, sigma=s_a, lower = 0, upper=100)

    likelihood = pm.Normal('y_samp', mu = a * xdata , sigma=1e-2, observed=ydata, shape=xdata.shape)
    trace = pm.sample(8000, tune = 2000, chains = 2, random_seed=rng, idata_kwargs={"log_likelihood":True}, return_inferencedata=True)
print(az.summary(trace, round_to=4, hdi_prob = 0.95))
                
x = np.linspace(2, 19, 1000)
y_true = (1 / (s_a_true * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_a_true) / s_a_true) ** 2)
fig = plt.figure()
fig.suptitle(f"mu_a_true: {mu_a_true}, s_a_true: {s_a_true}\n")

ax = fig.add_subplot(1,1,1)
mu_a_samples = trace.posterior.stack(sample=['chain','draw'])['mu_a'].values
s_a_samples = trace.posterior.stack(sample=['chain','draw'])['s_a'].values
ax.set_title(f"mu_a_mean = {round(mu_a_samples.mean(), 2)} , s_a_mean ={round(s_a_samples.mean(), 2)}")
for mean, std_dev in zip(mu_a_samples, s_a_samples):
    y = (1 / (std_dev * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mean) / std_dev) ** 2)
    ax.plot(x, y, color='#cdcdcd')
ax.plot(x,y_true, color='r', label="True dist")
y_mean = (1 / (s_a_samples.mean() * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_a_samples.mean()) / s_a_samples.mean()) ** 2)
ax.plot(x,y_mean, color='g', label="mean dist")
ax.legend()
ax.set_xlim(2,19)
ax.set_ylim(0,0.5)
plt.show()

I do not get back the true values of the distribution parameters mu_a and s_a (check the plot). For a simple case like this, I expect that flat priors would work. I have the following questions. Any comments or directions I can get could be of a lot of help.

  1. Is my model implementation correct? In the code, it looks like the normal distribution for likelihood is somehow consuming the prior on a (as if nested), whereas in the posterior expression, they are independent terms multiplied together (first term and second term in the expression of posterior above).

  2. How does pyMC marginalize out the variable ‘a’ (perform the integral) using sampling? Where can I read more about this?

  3. In this simple case, why don’t the marginal posterior means converge to the true values under flat priors? (See the attached plot.)

Thanks a lot for your time!

I assume the model is not converging because it is under-identified. For a given value of a, you have a single observation. So the relationship between a and ax is extremely loose and a could take on many, many different values even conditional on the observed data. It looks like you are trying to compensate for the lack of observed data (per value of a) by cranking down the sigma parameter of y_samp, but that seems less than ideal.

I would encourage you to take a look at the various examples of hierarchical model in the PyMC documentation (e.g., this, this, this, this, this, as well as this) to get an idea of the type of data sets that are typically seen with hierarchical model as well as some of the modeling choices one has to make when designing a hierarchical model.

1 Like