How to combine mutually independent traces

PG22 · March 25, 2024, 9:00pm

Say I have two InferenceData objects from two separate calls to pm.sample in two spectate models.

Say these contain posteriors for theta_1 and theta_2 respectively. These are mutually independent (eg.theta_1 is proportion of cats with tails, theta_2 is proportion of people called Hugh).

How would I combine these two create an inference data object with both posteriors in them, as if I’d called pm.sample on the model with both (do that I can feed this in to a posterior predictive involving both parameters)?

iavicenna · March 26, 2024, 8:48am

You can combine two inference data objects via

https://python.arviz.org/en/latest/api/generated/arviz.concat.html#arviz.concat

If you specify dim=“chain” it will concatenate overlapping groups by chain for instance.

PG22 · March 26, 2024, 9:09am

I see where I’ve caused confusion, I don’t think that is quite what I want, sorry.

What I’d like is a function returning a new idata with both beta1 and beta2 in the posterior group in the minimal example below. Does this exist?

with pm.Model() as m1:
    beta1 = pm.Normal('beta1',0,1, shape = 2)
    idata1 = pm.sample(
        )
    idata1.extend(
        pm.sample_prior_predictive(
        )
    )
    
with pm.Model() as m2:
    beta2 = pm.Normal('beta2',0,1, shape = 2)
    idata2 = pm.sample(
        )
    idata2.extend(
        pm.sample_prior_predictive(
        )
    )

new_idata = SOMEFUNCTION(idata1, idata2)

iavicenna · March 26, 2024, 9:30am

as if I’d called pm.sample on the model with both

When you say call “pm.sample on the model with both”, do you mean something more like this where two models still have their own separate likelihood:

import pymc as pm
import arviz as az
import numpy as np

data1 = np.random.normal(0, 1, size=1000)
data2 = np.random.normal(0, 1, size=1000)

with pm.Model() as m1:
  beta1 = pm.Normal("beta1",0,1)
  beta2 = pm.Normal("beta2",0,1)
  
  obs1 = pm.Normal("obs1", beta1, 1, observed=data1)
  obs2 = pm.Normal("obs2", beta2, 1, observed=data2)
  
  idata = pm.sample()
  idata.extend(pm.sample_prior_predictive())

with m1:
  pm.sample_posterior_predictive(idata, extend_inferencedata=True)

so that the for instance the posterior_predictive group contains dimensions:

(chain: 4, draw: 1000, obs1_dim_2: 1000, obs2_dim_2: 1000)

PG22 · March 26, 2024, 9:56am

Precisely (but obviously without wrapping them both just in one model context and re-running, assume that I already have traces from 2 expensive models to sample from).

iavicenna · March 26, 2024, 10:07am

Well, I dont know of any available functions that do that but I have not dabbled with Inference Objects in detail. If no such function exists I guess one can start doing it by manually. For instance if you have idata1, idata2 sampled independently and you have their posterior_predictive groups then

xr.concat([idata1["posterior_predictive"], idata2["posterior_predictive"]], dim=["chain","draw"])

creates the combination same as what you would have above from the combined model. So something along the lines of following might do it for you:

idata3 = az.data.inference_data.InferenceData()
group_to_data = {}

for key in ["posterior_predictive", "observed"]:
  group_to_data[key]=\
          xr.concat([idata1[key], idata2[key]], 
                     dim=["chain","draw"])


idata3.add_groups(group_to_data)

I have checked it and it seems to do the right thing for posterior_predictive and observed, you would need to check the others too.

iavicenna · March 26, 2024, 10:43am

Update: Ok turns out merge is the right way not concat. So the final merge function looks like

def merge_independent_inferences(idata1, idata2):

  idata_combined = az.data.inference_data.InferenceData()
  group_to_data = {}

  base_keys = set(["posterior","posterior_predictive","prior",
                   "prior_predictive", "observed_data"])
  common_keys = set(idata1.keys()).intersection(idata2.keys())

  for key in base_keys.intersection(common_keys):
    group_to_data[key]= xr.merge([idata1[key], idata2[key]])

  idata_combined.add_groups(group_to_data)

  return idata_combined

and then do for instance az.plot_ppc(idata_combined) or az.plot_posterior(idata_combined) it produces plots for both models which look identical to plots you get from combined model.

PG22 · March 26, 2024, 7:23pm

That is absolutely perfect, thank you so much!

Topic		Replies	Views
Create InferenceData from trace and sampled posterior prediction Questions	2	1577	October 12, 2020
Combining multiple traces version agnostic arviz	2	962	May 17, 2022
How to make InferenceData returned by sample() aware of the prior and posterior_predictive Questions	3	821	September 8, 2021
How is `merge_traces` to be used? Questions	7	1657	April 24, 2021
InferenceData incomplete Questions arviz	12	578	March 17, 2023

How to combine mutually independent traces

Related topics