I am new to pymc3 and trying to understand the inner workings on an example of linear regression:
with pm.Model() as m_5_1: marriage_age_s = pm.Data("marriage_age_s", df['MedianAgeMarriage_s'].values) divorce_rate_s = pm.Data("divorce_rate_s", df['Divorce_s']) alpha = pm.Normal("alpha", mu=0, sd=0.2) beta_a = pm.Normal("beta_a", mu=0, sd=0.5) sigma = pm.Exponential("sigma", 1) mu = pm.Deterministic("mu", alpha + beta_a*marriage_age_s) divorce_rate_hat = pm.Normal("divorce_rate_hat", mu=mu, sd=sigma, observed=divorce_rate_s) m_5_1_trace = pm.sample(1000, tune=1000,return_inferencedata=False) posterior_pred = pm.sample_posterior_predictive(m_5_1_trace)
I am trying to grasp the difference between the mu values from the trace and the average of the posterior distribution.
The following code gets me the means of the mu values for the input datapoints:
age_std_seq = df['MedianAgeMarriage_s'] mu_pred = np.zeros((len(age_std_seq), len(m_5_1_trace) * m_5_1_trace_.nchains)) for i, age_std in enumerate(age_std_seq): mu_pred[i] = m_5_1_trace["alpha"] + m_5_1_trace["beta_a"] * age_std mu_pred.mean(axis=1)
I get exactly the same values by taking mean of the deterministic mu:
But for the same model, when I sample posterior predictive distribution and average the predicted values like this:
I get very similar, but slightly different values.
What causes the difference? I was thinking that the values should be the same - what am I missing?