Posterior distribution of estimated parameter has a lot of variation

rodix · March 23, 2023, 5:10pm

Hello! I’m trying to solve the solutions for the 3rd problem from Rethinking Statistics course:
It refers to the cherry_blossom dataset, I’m sure is very familiar, you can find it here.

Below I replicated a model which estimates the mean day of the year as a linear regression of the temperature. I used the standardized values for temperature and day of the year (only after dropping the NaN from the cherry blossom dataframe).

def standardize(x):
    x = (x - np.mean(x)) / np.std(x)
    return x

> with pm.Model() as m2:
>     a = pm.Normal("a", 0, 10)
>     b = pm.Normal("b", 0, 10)
>     sigma = pm.Exponential("sigma", 1)
>     
>     pred = pm.MutableData('pred', df_cherry['temp_std'], dims="obs_id")
>     
>     mu = pm.Deterministic("mu", a + b*pred, dims="obs_id")
>     D = pm.Normal('D', mu, sigma, observed=df_cherry['doy_std'], dims="obs_id")
>     
>     m2_trace = pm.sample(return_inferencedata=True)

When I inspect the trace, the posterior distribution for the mean looks really off. From my understanding it looks like the posterior has a lot of variation between the samples.

Does anyone has any idea why this happens and how to fix it?

cluhmann · March 23, 2023, 5:49pm

Welcome!

When you plot the posterior for mu (the bottom panel), you are looking at a set of posteriors, one per observation (each a different color). So the variation you see in the bottom panel reflects the fact that a and b are pretty certain, but that df_cherry['temp_std'] (via pred) likely varies quite a bit across observations. Is that clearer?

rodix · March 24, 2023, 9:16am

Yes, it makes more sense, thanks for explaining this. I just realized that the posterior for mu was not usually plot by default (I upgraded my pymc version) when inspecting the trace and I wasn’t used with this plot visually, I was expecting to see only the priors distribution and have certain expectations on how that should look like.
Thanks!

Topic		Replies	Views
Mean of mu and mean of predictive posterior distribution Questions linear_model	2	1467	December 11, 2021
Apparent Posterior Bias With Increased Number of Data Points v5 modeling	2	491	November 22, 2022
What can I do to get my Posterior Distributions to stabalize? Questions	4	418	February 24, 2021
The example doesn't converge well and posteriors fluctuated Questions	3	561	March 21, 2019
Building a normal likelihood with known but varying variance for every data point/sample Questions	1	400	February 20, 2021

Posterior distribution of estimated parameter has a lot of variation

Related topics