Hej everyone!
I’m using the following regression model:
with pm.Model() as p1Categorical:
alpha = pm.Normal('alpha', mu=0, sd=10)
beta = pm.Normal('beta', mu=0, sd=2, shape=nCat)
sigma = pm.HalfCauchy('sigma', 1)
mu = pm.Deterministic('mu', alpha + beta[p1idx])
obs = pm.Normal('obs', mu=mu, sd=sigma, observed=p1['obs'].values)
trace_p1Categorical = pm.sample()
Intending to make the model hierarchical later but starting off by modelling only one group.
Now, I want to visualise data, prior predictive and posterior predictive together to see what I’m doing.
priorChecks = pm.sample_prior_predictive(samples=50000, model= p1Categorical)
ppc = pm.sample_posterior_predictive(trace_p1Categorical, 500, var_names=['alpha', 'beta', 'sigma'], model= p1Categorical)
def p1Categoricalm(samples, kind, categories):
x_cat = categories.categories
x_codes = np.unique(categories.codes)
y = np.zeros(0)
x = np.zeros(0)
for a, b in zip(samples['alpha'], samples['beta']):
y_temp = a + b[x_codes]
x = np.concatenate((x, x_cat))
y = np.concatenate((y, y_temp))
return pd.DataFrame({'target': x, 'obs': y, 'kind': kind})
prior_predictive = p1Categoricalm(priorChecks, 'prior pc', p1Cat)
posterior_predictive = p1Categoricalm(ppc, 'posterior pc', p1Cat)
real = p1.copy()[['obs', 'x']]
real['kind'] = 'data'
df = pd.concat((real, prior_predictive, posterior_predictive))
fig, axes = plt.subplots(figsize=(12, 5))
import seaborn as sns
sns.lineplot(x="x", y="obs", hue="kind", data=df, ax=axes)
The data and posterior predictive look as expected. But the prior predictive CI in the plot becomes more narrow with increasing number of prior predictive samples drawn. And also it is not centred around 0, which it should according to the prior definition. I’m gone up to 50k prior predictive samples and run the thing several times and I keep getting pretty much this:
What am I doing wrong or what did I misunderstand?