Hello! I’m trying to solve the solutions for the 3rd problem from Rethinking Statistics course:
It refers to the cherry_blossom dataset, I’m sure is very familiar, you can find it here.
Below I replicated a model which estimates the mean day of the year as a linear regression of the temperature. I used the standardized values for temperature and day of the year (only after dropping the NaN from the cherry blossom dataframe).
def standardize(x): x = (x - np.mean(x)) / np.std(x) return x
> with pm.Model() as m2: > a = pm.Normal("a", 0, 10) > b = pm.Normal("b", 0, 10) > sigma = pm.Exponential("sigma", 1) > > pred = pm.MutableData('pred', df_cherry['temp_std'], dims="obs_id") > > mu = pm.Deterministic("mu", a + b*pred, dims="obs_id") > D = pm.Normal('D', mu, sigma, observed=df_cherry['doy_std'], dims="obs_id") > > m2_trace = pm.sample(return_inferencedata=True)
When I inspect the trace, the posterior distribution for the mean looks really off. From my understanding it looks like the posterior has a lot of variation between the samples.
Does anyone has any idea why this happens and how to fix it?