# Calculations using posterior of input parameters differs in distribution from the posterior predictive

I’m having an issue where if I perform a calculation using the posterior of input variables from the trace the result is distributed differently to the posterior predictive of the model.

MODEL EXPLANATION (for context, can probably skip this):
I have fit a model of a chemical reaction to some observed data where:

CHL1 = f (CHL0, A0, A1, time(t), temperature(Temps))

A0 and A1 are normally distributed variables
CHL0, t, and Temps are measured data
CHL1 is a normal distribution with some Standard Deviation named eps

``````# Constants and measured parameters

# Gas constant
R = 8.3145 # J/mol/k

# Sodium hypochlorite molar mass
M = 74.44 # g/mol

# Activation energies
Ea1 = 24.8 # kCal/mol
Ea2 = 20.8 # kCal/mol
Ea0 = 21 #kCal/mol

with pm.Model() as model:

# Activation energies
Ea1 = Ea1 * 4.184 * 1000 # Convert to J/mol
Ea0 = Ea0 * 4.184 * 1000 # Convert to J/mol

# Temperatures
Temps = pm.Data('Temps', coogee_df['Avg Temp'].values) # oC
Temps = 273 + Temps # K
# Transit times
t = pm.Data('t', coogee_df['Transit'].values) # days
t = t * 60 * 24 # conver to minutes

# Initial concentrations
CHL0 = pm.Data('CHL0', coogee_df['CHL0'].values) # % W/V
x0 = CHL0/100/M*1000 # mol/L

# Final concentrations
measured = pm.Data('CHL1', coogee_df['CHL1'].values) # % W/V

# Arrhenius coefficients
A1 = pm.Normal('A1', mu=2.1, sigma=0.5)
# A2 = pm.Normal('A2', mu=3.2e11, sigma=3.e10)
A0 = pm.Normal('A0', mu=1, sigma=1)

A1 = A1 * 10 ** 12
A0 = A0 * 10 ** 10

# k0, k1
k1 = A1*np.exp(-Ea1/(R*Temps))
k0 = A0*np.exp(-Ea0/(R*Temps))

# b
b = 3/2*k1 + k0/x0

# Final concentration
x1 = 1/(1/x0 + b/k0*(np.exp(k0*t)-1)) # Mol/L
CHL1 = x1*M/1000*100 # % W/V.

# Error term
eps = pm.HalfNormal('eps', 0.1)

# Likelihood
pm.Normal('obs', mu=CHL1, sigma=eps, observed=measured)
``````

When I run sample on the model I get the following results:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
A1 2.161 0.503 1.217 3.102 0.007 0.005 5248.0 5580.0 1.0
A0 1.316 0.255 0.842 1.800 0.004 0.002 5270.0 5498.0 1.0
eps 0.319 0.030 0.264 0.377 0.000 0.000 6684.0 6010.0 1.0

These all make sense to me. A1 and A0 are what I would expect and based on the observed data a Standard Deviation of 0.319 is believable.

I wanted to check the model by putting through a single set of parameters for CHL0, t, and Temp so I made an identical model which uses the posterior of A1 and A0.

``````# TROUBLESHOOTING USING POSTERIORS

# Gas constant
R = 8.3145 # J/mol/k

# Sodium hypochlorite molar mass
M = 74.44 # g/mol

# Activation energies
Ea1 = 24.8 * 4.184 * 1000 # kCal/mol
Ea2 = 20.8 * 4.184 * 1000 # kCal/mol
Ea0 = 21.0 * 4.184 * 1000 #kCal/mol

# Test parameters and target
CHL0 = 13.0 # % W/V
t = 4 # days
t = 60 * 24 * t # seconds
T = 27.7 # oC
T = 273 + T # K
CHL1_target = 12.25 # % W/V, real measured result (not used in model)

# Model
x0 = CHL0/100/M*1000 # mol/L

# Arrhenius coefficients
A1 = idata['posterior']['A1'].values
A0 = idata['posterior']['A0'].values

A1 = A1 * 10 ** 12
A0 = A0 * 10 ** 10

# k0, k1
k1 = A1*np.exp(-Ea1/(R*T))
k0 = A0*np.exp(-Ea0/(R*T))

# b
b = 3/2*k1 + k0/x0

x1 = 1/(1/x0 + b/k0*(np.exp(k0*t)-1))
CHL1 = x1*M/1000*100 # % W/V

print(CHL1.mean())
plt.hist(CHL1, bins=50)
plt.ticklabel_format(useOffset=False)
plt.xticks(rotation=45)
``````

THE ISSUE:
When I take a mean of the CHL1 calculated using the posteriors of A0 and A1 I get the correct value but when I plot a histogram of CHL1 the distribution seems too narrow. I’m expecting a mean of ~12.1 but the whole distribution is between 11.95 and 12.25.

I went ahead and plotted the distributions using CHL1 as the mean and the eps posterior which results in a much wider plot 10.5 to 13.

This also matches with the results from setting new model data and doing a pm.sample_posterior_predictive.

``````with model:
pm.set_data({'Temps': [27.7], 't': , 'CHL0': })
pred_data = pm.sample_posterior_predictive(idata)

predictions = pred_data['posterior_predictive']['obs']
for i in range(len(predictions)):
plt.hist(predictions[i], bins=50, alpha=0.6)
plt.ticklabel_format(useOffset=False)
plt.xticks(rotation=45)
``````

I assume I’m misusing something here but don’t really understand what. My confusion is around the histogram of the results calculated from input parameter posteriors being so much narrower than the histogram of the posterior predictive samples?
Are they meant to be the same and I’ve made an error in my test or is sample_posterior_predictive doing more than just calculating results based on the trace and then fitting a normal distribution to those results?

I didn’t read through the model and your post-sampling calculations carefully, but one thing you might want to try is wrapping some of your intermediate calculations in a `pm.Deterministic()`. That way you will have those intermediate quantities in your posterior, which might help you to figure out where things might be deviating from your own calculations. Just a thought.

You are comparing `obs` with `CHL1` but they are different quantities. `CHL1` is a posterior variable representing the mean of the `obs` variable, hence the reduced dispersion. `obs` is a posterior predictive variable, it includes a further sampling/intergation step compare to posterior variables.

`pm.sample_posterior_predictive` takes care of this extra sampling step, your 2nd plot with multiple lines also kind of takes care of this, you are plotting distributions whose mean is `CHL1` and their dispersion is `eps` which is different from plotting a histogram with all the values of `CHL1`.

I thought it would be something like this but couldn’t formalise it in my own head.

Thank you.