Hello,
I’m following Osvaldo Martin’s ‘Bayesian Analysis with Python’ book and am having trouble with the section on robust linear regression.
The example uses the 3rd dataset from the Anscombe Quartet, with the goal of using a T distribution to make the model robust to the outlier.
I’ve generated the model with
with pm.Model() as anscombe3_model_t:
α = pm.Normal('α', mu=y_3.mean(), sd=1)
β = pm.Normal('β', mu=0, sd=1)
ϵ = pm.HalfCauchy('ϵ', 5)
ν_ = pm.Exponential('ν_', 1/29)
ν = pm.Deterministic('ν', ν_ + 1)
y_pred = pm.StudentT('y_pred', mu=α+β*x_3, sd=ϵ, nu=ν, observed=y_3)
trace_anscombe_t = pm.sample(2000, tune=1000)
Generating a fit line using the mean of the trace, I get what looks like the correct answer, i.e. a line which runs through all the points except the outlier
From this (and from the fact that the sd of α, β and ϵ and are all 0.0) I would expect the PPC to provide a pretty good estimate of y, but the samples generate by the following code show y predictions (and a mean of the y predictions) that are way off what I would expect, given how well the regression line fits, and also far from what Martin gets with the same code
ppc = pm.sample_posterior_predictive(trace_anscombe_t, samples=1000,
model=anscombe3_model_t)
data_ppc = az.from_pymc3(trace=trace_anscombe_t, posterior_predictive=ppc)
ax=az.plot_ppc(data_ppc, figsize=(8,5), mean=True)
plt.xlim(0,14)
plt.show()
My results
(I can only upload 1 image, but Martin’s results show a y_pred mean which is on the same scale as the observed results, whereas mine is way below it)
Am I doing something wrong with my PPC?
Thanks