Converting Posterior Samples into Data Frame

Hello,

I’m trying an experiment in which I need the posterior samples put into a data frame in such a way that results in a data frame of 100,000 rows by 5,000 columns for a posterior consisting of 5000 samples. The samples fun but the code results in all samples put in a 1 column data frame. See code and data below.

posterior_samples_train = pm.sample_posterior_predictive(trace, model = model, samples = 5000)
posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train

I want the ‘y’ values instead of the posterior coefficient samples I get from

pm.trace_to_dataframe

Any ideas on how to create a dataframe of the ‘y’ samples?

Jordan
Dataframes don’t tend to work well with many types of Bayesian models. Due to this issue ArviZ supports converted Inference results into Xarray Datasets

My suggestion would be to use az.from_pymc3 to get your results into a nice data container

https://arviz-devs.github.io/arviz/generated/arviz.from_pymc3.html#arviz.from_pymc3

1 Like

Thank you @RavinKumar. I’m done with the bayesian model. I want to take the posterior ‘y’ value samples and do something else with them. That’s why I was hoping to get a data frame. I can get a data frame with a point estimate from the posterior by using

pd.DataFrame(posterior_samples_train[‘y’].mean(axis = 0))

There’s no way to get all of the samples out?

What’s the shape of posterior_samples_train['y'] right after sampling?

Hi @lucianopaz.

To answer your question, the shape is (161533, 5000, 1).

But I think I just figured it out how to get what I’m looking for. The below works unless there is a more efficient way to do this.

posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = np.squeeze(posterior_samples_train, axis = 2)
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train = posterior_samples_train.transpose()

This should be equivalent to

posterior_samples_train = pd.DataFrame(posterior_samples_train['y'].squeeze().T)

Thank you @colcarroll.