Hi everyone,
question updated:
I am new to pymc3 and trying to understand the inner workings on an example of linear regression:
with pm.Model() as m_5_1:
marriage_age_s = pm.Data("marriage_age_s", df['MedianAgeMarriage_s'].values)
divorce_rate_s = pm.Data("divorce_rate_s", df['Divorce_s'])
alpha = pm.Normal("alpha", mu=0, sd=0.2)
beta_a = pm.Normal("beta_a", mu=0, sd=0.5)
sigma = pm.Exponential("sigma", 1)
mu = pm.Deterministic("mu", alpha + beta_a*marriage_age_s)
divorce_rate_hat = pm.Normal("divorce_rate_hat", mu=mu, sd=sigma, observed=divorce_rate_s)
m_5_1_trace = pm.sample(1000, tune=1000,return_inferencedata=False)
posterior_pred = pm.sample_posterior_predictive(m_5_1_trace)
I am trying to grasp the difference between the mu values from the trace and the average of the posterior distribution.
In particular:
The following code gets me the means of the mu values for the input datapoints:
age_std_seq = df['MedianAgeMarriage_s']
mu_pred = np.zeros((len(age_std_seq), len(m_5_1_trace) * m_5_1_trace_.nchains))
for i, age_std in enumerate(age_std_seq):
mu_pred[i] = m_5_1_trace["alpha"] + m_5_1_trace["beta_a"] * age_std
mu_pred.mean(axis=1)
I get exactly the same values by taking mean of the deterministic mu:
m_5_1_trace['mu'].mean(axis=0)
But for the same model, when I sample posterior predictive distribution and average the predicted values like this:
posterior_pred['divorce_rate_hat'].mean(axis=0)
I get very similar, but slightly different values.
What causes the difference? I was thinking that the values should be the same - what am I missing?