Edit ArviZ data structure

Hi!

I am trying to fix a label switching problem in a mixture model by post-processing the inference data I obtained during sampling and re-label some of the components. However, to do this, I need to edit my posterior trace. I want to do something like

trace.sel(chain=[1]).posterior['mu'][0][:,0] = trace.sel.sel(chain=[1]).posterior['mu'][0][:,1]

But, by doing this, trace is unaltered. Is there a way to edit ArviZ data structures?

1 Like

I suspect that you will need to use the operations describe here. Note that xarray explicitly states that assignment using .sel() will fail silently, which is probably what is happening when you run your assignment code.

2 Likes

Commenting mostly to ratify the link provided by @cluhmann. I think in this specific situation you want .loc.


Extra tangential notes:

You can use post = trace.posterior as indicated in Working with InferenceData — ArviZ dev documentation to work with the posterior Dataset directly in a less verbose way. Python assigns by reference, so modifying post will modify trace.posterior unless you make a copy when defining post.

:warning: Use of positional indexing and multiple indexing aproaches :warning:

Combining sel(chain=1) with multiple instances of positional indexing is a recipe for disaster. In xarray the dimension order should be irrelevant, only the dimension name matters, and there are xarray functions that modify the dimension order. Here it looks like you are only using sel right after the trace is obtained, so nothing should go wrong, but it is bad practice to rely on dimensions being in a given order instead of using their names. Moreover, by using label indexing you won’t need to use : for dimensions you don’t want to index, simply skip it.

2 Likes

@cluhmann @OriolAbril thank you both for your answers and advice! I was able to implement it.

I was not using the name of the variables only because I was defining multiple random variables under the same name. Here is an example of what I was doing

mu = pm.Normal('mu', mu = [0,1,2],  sigma = [2,2,2])

I modified it and it really helped me! Now the code is much cleaner.

1 Like

This is perfectly valid and recommended practice. But I’d recomment you annotate the dimension in the mu variable so they you can select the component of my with prior mean 1 with sel(dim="coord_for_1").

You might find this blogpost I wrote PyMC 4.0 with labeled coords and dims | Oriol unraveled and Working with InferenceData — ArviZ dev documentation

2 Likes

Thanks for your advice. I defined the dimensions for the random variable. However, if I have to select the components using .sel, wouldn’t that create the problem that @cluhmann described?

Note that xarray explicitly states that assignment using .sel() will fail silently, which is probably what is happening when you run your assignment code

You can’t use .sel on the left hand side of the equal, because it then fails silently. @cluhmann mentioned this explicitly because that is what was happening in your original example. But before that he share a link to the assigning values with indexing section of the xarray docs.

There it is explained that to modify existing objects you need to use .loc or .where.

Assuming your mu has a single dimension dim of length 3 with coordinate values a, b, c, something like:

<xarray.Dataset>
Dimensions:  (chain: 4, draw: 10, dim: 3)
Coordinates:
  * chain    (chain) int64 0 1 2 3
  * draw     (draw) int64 0 1 2 3 4 5 6 7 8 9
  * dim      (dim) <U1 'a' 'b' 'c'
Data variables:
    x        (chain, draw, dim) float64 -1.439 -0.1629 0.6451 ... -1.547 1.534

then, to swap the content of components a and b but only for chains 0 and 3 you can do:

ds.loc[dict(chain=[0, 3], dim=["b", "a"])] = ds.sel(
    chain=[0, 3], dim=["a", "b"]
).assign_coords(dim=["b", "a"])  # important to update the coords if present
1 Like

This is really helpful and the code is very clean! I will implement it!

Thanks!

1 Like