Edit ArviZ data structure

j_catulo · June 21, 2022, 1:32pm

Hi!

I am trying to fix a label switching problem in a mixture model by post-processing the inference data I obtained during sampling and re-label some of the components. However, to do this, I need to edit my posterior trace. I want to do something like

trace.sel(chain=[1]).posterior['mu'][0][:,0] = trace.sel.sel(chain=[1]).posterior['mu'][0][:,1]

But, by doing this, trace is unaltered. Is there a way to edit ArviZ data structures?

cluhmann · June 21, 2022, 1:39pm

I suspect that you will need to use the operations describe here. Note that xarray explicitly states that assignment using .sel() will fail silently, which is probably what is happening when you run your assignment code.

OriolAbril · June 21, 2022, 2:19pm

Commenting mostly to ratify the link provided by @cluhmann. I think in this specific situation you want .loc.

Extra tangential notes:

You can use post = trace.posterior as indicated in Working with InferenceData — ArviZ dev documentation to work with the posterior Dataset directly in a less verbose way. Python assigns by reference, so modifying post will modify trace.posterior unless you make a copy when defining post.

Use of positional indexing and multiple indexing aproaches

Combining sel(chain=1) with multiple instances of positional indexing is a recipe for disaster. In xarray the dimension order should be irrelevant, only the dimension name matters, and there are xarray functions that modify the dimension order. Here it looks like you are only using sel right after the trace is obtained, so nothing should go wrong, but it is bad practice to rely on dimensions being in a given order instead of using their names. Moreover, by using label indexing you won’t need to use : for dimensions you don’t want to index, simply skip it.

j_catulo · June 21, 2022, 8:25pm

@cluhmann @OriolAbril thank you both for your answers and advice! I was able to implement it.

I was not using the name of the variables only because I was defining multiple random variables under the same name. Here is an example of what I was doing

mu = pm.Normal('mu', mu = [0,1,2],  sigma = [2,2,2])

I modified it and it really helped me! Now the code is much cleaner.

OriolAbril · June 21, 2022, 9:53pm

This is perfectly valid and recommended practice. But I’d recomment you annotate the dimension in the mu variable so they you can select the component of my with prior mean 1 with sel(dim="coord_for_1").

You might find this blogpost I wrote Redirecting to Oriol Unraveled and Working with InferenceData — ArviZ dev documentation

j_catulo · June 22, 2022, 10:44am

Thanks for your advice. I defined the dimensions for the random variable. However, if I have to select the components using .sel, wouldn’t that create the problem that @cluhmann described?

Note that xarray explicitly states that assignment using .sel() will fail silently, which is probably what is happening when you run your assignment code

OriolAbril · June 22, 2022, 11:14am

You can’t use .sel on the left hand side of the equal, because it then fails silently. @cluhmann mentioned this explicitly because that is what was happening in your original example. But before that he share a link to the assigning values with indexing section of the xarray docs.

There it is explained that to modify existing objects you need to use .loc or .where.

Assuming your mu has a single dimension dim of length 3 with coordinate values a, b, c, something like:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 10, dim: 3)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
  * dim      (dim) <U1 'a' 'b' 'c'
Data variables:
    x        (chain, draw, dim) float64 -1.439 -0.1629 0.6451 ... -1.547 1.534

then, to swap the content of components a and b but only for chains 0 and 3 you can do:

ds.loc[dict(chain=[0, 3], dim=["b", "a"])] = ds.sel(
    chain=[0, 3], dim=["a", "b"]
).assign_coords(dim=["b", "a"])  # important to update the coords if present

j_catulo · June 22, 2022, 11:18am

This is really helpful and the code is very clean! I will implement it!

Thanks!

Topic		Replies	Views
Control the order of coordinates in arviz.InferenceData Questions	8	2600	March 24, 2021
How to get the corresponding index values of Arviz.InferenceData.sel() v5 modeling , arviz	3	350	September 5, 2023
Issues with plotting when combining chains and draws of InferenceData v5 arviz	4	696	November 7, 2022
Using trace data for posterior calculations v3 arviz	1	839	July 18, 2023
Label switching in Hidden Markov Models version agnostic	3	573	June 22, 2022

Edit ArviZ data structure

Related topics