Mean of mu and mean of predictive posterior distribution

Hi everyone,
question updated:

I am new to pymc3 and trying to understand the inner workings on an example of linear regression:

with pm.Model() as m_5_1:

    marriage_age_s = pm.Data("marriage_age_s", df['MedianAgeMarriage_s'].values)
    divorce_rate_s = pm.Data("divorce_rate_s", df['Divorce_s'])

    alpha = pm.Normal("alpha", mu=0, sd=0.2)
    beta_a = pm.Normal("beta_a", mu=0, sd=0.5)
    sigma = pm.Exponential("sigma", 1)
    
    mu = pm.Deterministic("mu", alpha + beta_a*marriage_age_s)
    
    divorce_rate_hat = pm.Normal("divorce_rate_hat", mu=mu, sd=sigma, observed=divorce_rate_s)
    
    m_5_1_trace = pm.sample(1000, tune=1000,return_inferencedata=False)
    posterior_pred = pm.sample_posterior_predictive(m_5_1_trace)

I am trying to grasp the difference between the mu values from the trace and the average of the posterior distribution.

In particular:
The following code gets me the means of the mu values for the input datapoints:

age_std_seq = df['MedianAgeMarriage_s']
mu_pred = np.zeros((len(age_std_seq), len(m_5_1_trace) * m_5_1_trace_.nchains))

for i, age_std in enumerate(age_std_seq):
    mu_pred[i] = m_5_1_trace["alpha"] + m_5_1_trace["beta_a"] * age_std

mu_pred.mean(axis=1)

I get exactly the same values by taking mean of the deterministic mu:
m_5_1_trace['mu'].mean(axis=0)

But for the same model, when I sample posterior predictive distribution and average the predicted values like this:

posterior_pred['divorce_rate_hat'].mean(axis=0)

I get very similar, but slightly different values.

What causes the difference? I was thinking that the values should be the same - what am I missing?

Welcome!

The observed divorce_rate_hat, despite being centered at mu, is distributed around mu with sd=sigma. Your calculation of mu_pred recreates the “best fitting regression line” (e.g., mu) but doesn’t incorporate sigma whereas the posterior predictions do. To recreate the posterior predictions, you would need to take mu_pred, plug in as the location (mean) in a normal distribution, grab the corresponding value of sigma from the trace and plug it in as the scale (SD) and draw a random value from that distribution.

3 Likes

Thank you very much for the detailed explanation!

1 Like