How to retrieve information from inference data?

I am trying to retrieve information from the groups included in an inference data set and to present it in a DataFrame format.

Let me give you a short example:

import pymc as pm
import pandas as pd

samples = 100
trials = 10
successes = 7

with pm.Model() as model:
    p = pm.Beta("p", alpha = 0.5, beta = 0.5)
    likelihood = pm.Binomial("likelihood", p=p, n=trials, observed=successes)
    model_prior = pm.sample_prior_predictive(samples=samples) 

df_prior = pd.DataFrame(model_prior, columns=["prior groups"])

This gives me the output I expected:

prior groups
0 prior
1 prior_predictive
2 observed_data

However, there is much more information in the groups of the model_prior dataset, in particular variable names and array shapes…

Do you know if it is possible to retrieve some more information from these groups in order - for example - to get the following output:

prior groups variables shapes
0 prior p (1, 100)
1 prior_predictive likelihood (1, 100)
2 observed_data likelihood (1,)

Or, in other words, is there a way to “get inside” and “manipulate” the arviz.InferenceData ?

Poor formatting, but this will sort of get you what you want:

for group in ["prior", "prior_predictive", "observed_data"]:
    for var in list(model_prior[group].keys()):
        print(f"  {var}: {model_prior[group][var].shape}")

But in general, you would probably not want to do this because there could be a large number of variables in each group. On top of that, you are using shapes rather than the preferred coords/dims that you see when you inspect an individual group (see here and here for more info on using dimensions).

Inspecting model_prior.prior yields:

Dimensions:  (chain: 1, draw: 100)
  * chain    (chain) int64 0
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 98 99
Data variables:
    p        (chain, draw) float64 5.762e-05 0.4165 0.99 ... 0.3721 0.9852
    created_at:                 2023-03-29T15:06:39.322939
    arviz_version:              0.14.0
    inference_library:          pymc
    inference_library_version:  5.0.0

In particular, this tells you that p has 2 dimensions (chain and draw) and that chain has 1 coordinate and draw has 100. But the fact that you have 2 dimensions and (now) know what those dimensions are typically provides you with much more information than something like p: (1, 100).

1 Like

Hi Christian, I’ll take a closer look at the two documents you sent me, and I think I’ll find what I’m looking for there.

For the moment, I do not yet know how to properly handle these coords and dims; so, these docs will certainly be of great help for what I had in mind: to present some results in a concentrate and concise way, for example in the idea of projecting slides for a powerpoint presentation.

I’d highly recommend spending some time going over one (or more) of the example notebooks. They provide examples of best practice. I’d recommend taking a look at one of the GLM examples. Though not all them require coords, virtually all of them show examples of using the information in the returned inferencedata.

1 Like

You might also find Working with InferenceData — ArviZ 0.15.1 documentation useful