Converting Posterior Samples into Data Frame

jordan.howell2 · February 7, 2019, 3:37pm

Hello,

I’m trying an experiment in which I need the posterior samples put into a data frame in such a way that results in a data frame of 100,000 rows by 5,000 columns for a posterior consisting of 5000 samples. The samples fun but the code results in all samples put in a 1 column data frame. See code and data below.

posterior_samples_train = pm.sample_posterior_predictive(trace, model = model, samples = 5000)
posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train

I want the ‘y’ values instead of the posterior coefficient samples I get from

pm.trace_to_dataframe

Any ideas on how to create a dataframe of the ‘y’ samples?

RavinKumar · February 7, 2019, 5:07pm

Jordan
Dataframes don’t tend to work well with many types of Bayesian models. Due to this issue ArviZ supports converted Inference results into Xarray Datasets

My suggestion would be to use az.from_pymc3 to get your results into a nice data container

https://arviz-devs.github.io/arviz/generated/arviz.from_pymc3.html#arviz.from_pymc3

jordan.howell2 · February 7, 2019, 5:22pm

Thank you @RavinKumar. I’m done with the bayesian model. I want to take the posterior ‘y’ value samples and do something else with them. That’s why I was hoping to get a data frame. I can get a data frame with a point estimate from the posterior by using

pd.DataFrame(posterior_samples_train[‘y’].mean(axis = 0))

There’s no way to get all of the samples out?

lucianopaz · February 7, 2019, 6:56pm

What’s the shape of posterior_samples_train['y'] right after sampling?

jordan.howell2 · February 7, 2019, 7:01pm

Hi @lucianopaz.

To answer your question, the shape is (161533, 5000, 1).

But I think I just figured it out how to get what I’m looking for. The below works unless there is a more efficient way to do this.

posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = np.squeeze(posterior_samples_train, axis = 2)
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train = posterior_samples_train.transpose()

colcarroll · February 7, 2019, 9:25pm

This should be equivalent to

posterior_samples_train = pd.DataFrame(posterior_samples_train['y'].squeeze().T)

jordan.howell2 · February 8, 2019, 1:42pm

Thank you @colcarroll.

Topic		Replies	Views
Getting Point Estimates from Posterior to add to the Data Frame v5	4	770	May 12, 2022
How does one pull the posterior samples out of the trace and make a data frame? Questions	2	720	January 15, 2019
Converting posterior samples into a dataframe v3 arviz	3	894	July 18, 2023
Could somebody provide a minimal example for sample_posterior_predictive() Questions	2	413	April 15, 2021
Plotting Sample_Posterior_Predictive output with Arviz v5 arviz	6	1756	January 26, 2023

Converting Posterior Samples into Data Frame

Related topics