Getting Point Estimates from Posterior to add to the Data Frame

Hello. At one point, I asked how to get the mean of each posterior predictive observation out of the ppc check. So after I run posterior_samples = pm.sample_posterior_predictive(trace) I was told to run the following:

np.mean(posterior_samples['obs_name'], axis=0) That was from this post ( How to Pull Point Estimates Out of Posterior Check - Questions - PyMC Discourse

However, this runs but isn’t what I’m looking for at this point. My dataframe is 1,774 observations. I’m trying to pull out 1,774 mean values from the 1,000 samples pulled. When I run the above, the shape of my array is (1000, 1774). I’m trying to get samples with shape (1774, ) to add to my data frame (and eventually pull out the 5% and 95% samples to add that as well).

I’m sure this is a python sytax issues somewhere in my code. Does anyone know how to do this?

I don’t know what posterior_samples is, I assume it is the posterior group of the generated InferenceData, let me know if this were not the case and this doesn’t solve your question. You should find the answer to your question and similar ones at Working with InferenceData — ArviZ dev documentation

Thank you. I edited the question to clarify that. I checked out Working with InferenceData — ArviZ dev documentation. I moved the predicted values to it’s own array using:

`x = posterior_predictive.posterior_predictive.predicted_sales’

Then followed the instructions by taking the mean using:

x.mean()

This gives me shape (6,1000,1774)

How do I take a final mean value for each observation that was sampled?

Here you are selecting a single variable out of the posterior predictive dataset.

So x.mean() should return a scalar, this shape you mention is the original one. This is one of the things shown in the guide I linked above, specifically in the “Compute posterior mean values along draw and chain dimensions” section. You want to reduce the chain and draw dimensions so you need to tell the function being called that these are the dimensions to be reduced (to avoid the default of reducing all dimensions). This works for mean and for other DataArray or Dataset methods (like for example quantile).


I really want you (and anyone) to read the guide and therefore try to avoid answering questions directly with “copy pasteable” code for several reasons.

I am one of the authors of that page and I dedicated a lot of time in adding the examples and explanations and making sure they were clear; I can only dedicate a very small fraction of that time to answering questions here so you will really be better off (or should be) reading that.

At the same time I also know it is nor perfect and we are always trying to improve that page and all the documentation in general, that page in particular has already had several major improvements after realizing some examples were missing or not clear enough; having the answer here makes it available for everyone, but it still very hard to reach, the best scenario to make sure other people don’t face again the same issues you are having is improving that page (and adding links to that page wherever necessary).

1 Like

My apologies. I did miss this part…post.mean(dim=['chain', 'draw']).