Hello! I’m trying to solve the solutions for the 3rd problem from Rethinking Statistics course:
It refers to the cherry_blossom dataset, I’m sure is very familiar, you can find it here.
Below I replicated a model which estimates the mean day of the year as a linear regression of the temperature. I used the standardized values for temperature and day of the year (only after dropping the NaN from the cherry blossom dataframe).
def standardize(x):
x = (x - np.mean(x)) / np.std(x)
return x
> with pm.Model() as m2:
> a = pm.Normal("a", 0, 10)
> b = pm.Normal("b", 0, 10)
> sigma = pm.Exponential("sigma", 1)
>
> pred = pm.MutableData('pred', df_cherry['temp_std'], dims="obs_id")
>
> mu = pm.Deterministic("mu", a + b*pred, dims="obs_id")
> D = pm.Normal('D', mu, sigma, observed=df_cherry['doy_std'], dims="obs_id")
>
> m2_trace = pm.sample(return_inferencedata=True)
When I inspect the trace, the posterior distribution for the mean looks really off. From my understanding it looks like the posterior has a lot of variation between the samples.
Does anyone has any idea why this happens and how to fix it?