How to retrieve information from inference data?

I am trying to retrieve information from the groups included in an inference data set and to present it in a DataFrame format.

Let me give you a short example:

import pymc as pm
import pandas as pd

samples = 100
trials = 10
successes = 7

with pm.Model() as model:
    p = pm.Beta("p", alpha = 0.5, beta = 0.5)
    likelihood = pm.Binomial("likelihood", p=p, n=trials, observed=successes)
    model_prior = pm.sample_prior_predictive(samples=samples) 

df_prior = pd.DataFrame(model_prior, columns=["prior groups"])
df_prior

This gives me the output I expected:

prior groups
0 prior
1 prior_predictive
2 observed_data

However, there is much more information in the groups of the model_prior dataset, in particular variable names and array shapes…

Do you know if it is possible to retrieve some more information from these groups in order - for example - to get the following output:

prior groups variables shapes
0 prior p (1, 100)
1 prior_predictive likelihood (1, 100)
2 observed_data likelihood (1,)

Or, in other words, is there a way to ā€œget insideā€ and ā€œmanipulateā€ the arviz.InferenceData ?

Poor formatting, but this will sort of get you what you want:

for group in ["prior", "prior_predictive", "observed_data"]:
    print(f"{group}")
    for var in list(model_prior[group].keys()):
        print(f"  {var}: {model_prior[group][var].shape}")

But in general, you would probably not want to do this because there could be a large number of variables in each group. On top of that, you are using shapes rather than the preferred coords/dims that you see when you inspect an individual group (see here and here for more info on using dimensions).

Inspecting model_prior.prior yields:

<xarray.Dataset>
Dimensions:  (chain: 1, draw: 100)
Coordinates:
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
Data variables:
    p        (chain, draw) float64 5.762e-05 0.4165 0.99 ... 0.3721 0.9852
Attributes:
    created_at:                 2023-03-29T15:06:39.322939
    arviz_version:              0.14.0
    inference_library:          pymc
    inference_library_version:  5.0.0

In particular, this tells you that p has 2 dimensions (chain and draw) and that chain has 1 coordinate and draw has 100. But the fact that you have 2 dimensions and (now) know what those dimensions are typically provides you with much more information than something like p: (1, 100).

Hi Christian, I’ll take a closer look at the two documents you sent me, and I think I’ll find what I’m looking for there.

For the moment, I do not yet know how to properly handle these coords and dims; so, these docs will certainly be of great help for what I had in mind: to present some results in a concentrate and concise way, for example in the idea of projecting slides for a powerpoint presentation.

I’d highly recommend spending some time going over one (or more) of the example notebooks. They provide examples of best practice. I’d recommend taking a look at one of the GLM examples. Though not all them require coords, virtually all of them show examples of using the information in the returned inferencedata.

You might also find Working with InferenceData — ArviZ 0.15.1 documentation useful