Hello,
I’m trying an experiment in which I need the posterior samples put into a data frame in such a way that results in a data frame of 100,000 rows by 5,000 columns for a posterior consisting of 5000 samples. The samples fun but the code results in all samples put in a 1 column data frame. See code and data below.
posterior_samples_train = pm.sample_posterior_predictive(trace, model = model, samples = 5000)
posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train
I want the ‘y’ values instead of the posterior coefficient samples I get from
pm.trace_to_dataframe
Any ideas on how to create a dataframe of the ‘y’ samples?
Jordan
Dataframes don’t tend to work well with many types of Bayesian models. Due to this issue ArviZ supports converted Inference results into Xarray Datasets
My suggestion would be to use az.from_pymc3
to get your results into a nice data container
https://arviz-devs.github.io/arviz/generated/arviz.from_pymc3.html#arviz.from_pymc3
1 Like
Thank you @RavinKumar. I’m done with the bayesian model. I want to take the posterior ‘y’ value samples and do something else with them. That’s why I was hoping to get a data frame. I can get a data frame with a point estimate from the posterior by using
pd.DataFrame(posterior_samples_train[‘y’].mean(axis = 0))
There’s no way to get all of the samples out?
What’s the shape of posterior_samples_train['y']
right after sampling?
Hi @lucianopaz.
To answer your question, the shape is (161533, 5000, 1).
But I think I just figured it out how to get what I’m looking for. The below works unless there is a more efficient way to do this.
posterior_samples_train = np.asarray(posterior_samples_train[‘y’])
posterior_samples_train = np.squeeze(posterior_samples_train, axis = 2)
posterior_samples_train = pd.DataFrame(posterior_samples_train)
posterior_samples_train = posterior_samples_train.transpose()
This should be equivalent to
posterior_samples_train = pd.DataFrame(posterior_samples_train['y'].squeeze().T)