Converting posterior samples into a dataframe

Ansul · June 26, 2023, 10:42am

Hi,
I am trying to convert the trace in to a dataframe. I used to use
file_read_in= trace.to_dataframe(groups = “posterior”, include_coords = False) but it seems to_dataframe does not work with multitrace object.
I tried now with the following code -

file_read_in=az.from_pymc3(trace=trace).to_dataframe(include_coords=False)

this creates a dataframe which has word “posterior” and quotes"()" in the variable name -

This is not letting me filter for specific variables in the dataframe when I try to transpose and filter -

df_sum_parts= pd.melt(file_read_in, id_vars = ['chain', 'draw'], var_name = 'var_name', value_name = 'value') 
df_sum_parts= df_sum_parts[df_sum_parts['var_name'].str.contains('decay_')]

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/tmp/ipykernel_322/2409040180.py in <module>
      1 #df_sum_parts = pd.melt(file_read_in, id_vars = ['chain', 'draw'], var_name = 'var_name', value_name = 'incr_items')
----> 2 df_sum_parts = df_sum_parts[df_sum_parts['var_name'].str.contains('decay_')]
      3 df_sum_parts.tail()

/opt/conda/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3509             if is_iterator(key):
   3510                 key = list(key)
-> 3511             indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3512 
   3513         # take() does not accept boolean indexers

/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in _get_indexer_strict(self, key, axis_name)
   5780             keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
   5781 
-> 5782         self._raise_if_missing(keyarr, indexer, axis_name)
   5783 
   5784         keyarr = self.take(indexer)

/opt/conda/lib/python3.8/site-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name)
   5840                 if use_interval_msg:
   5841                     key = list(key)
-> 5842                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   5843 
   5844             not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              ...\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64', length=39424000)] are in the [columns]"

Could someone please tell me the way to convert trace to a pandas dataframe and then use filtering techniques like the one mentioned above?

cluhmann · June 27, 2023, 12:46pm

Do you not have the trace object at runtime? You only have it as a file saved to disk?

Ansul · July 18, 2023, 10:23am

I was able to convert the trace into a dataframe with the following command -

az.convert_to_inference_data(obj=trace).to_dataframe(include_coords=False,groups=“posterior”)

This creates a nice dataframe which does not have the word “posterior” and quotes"()" in the column names

OriolAbril · July 18, 2023, 11:48am

The important part is using the groups keyword in to_dataframe. convert_to_inferencedata calls from_pymc3 internally, and it is recommended to use from_pymc3 directly instead

Topic		Replies	Views
How does one pull the posterior samples out of the trace and make a data frame? Questions	2	717	January 15, 2019
Trace_to_dataframe errors after trace.remove_values Questions	5	1987	June 12, 2020
Error in sample_posterior_predictive when using trace from dataframe Questions	1	901	September 6, 2019
Converting Posterior Samples into Data Frame Questions	6	2746	February 8, 2019
Sample_posterior_predictive Questions	1	914	May 2, 2019

Converting posterior samples into a dataframe

Related topics