API question: What are the semantics of array-style indexing on a Distribution object, with another PyMC object?

mdrum · May 6, 2022, 7:20am

I am new to PyMC and I am studying the Simpson’s Paradox tutorial. The last line in Models 2 and 3 (cells 12 and 17) is

pm.Normal("y", mu=μ, sigma=sigma[g], observed=data.y, dims="observation")

where sigma is previously bound to a pm.HalfCauchy, i.e.,

sigma = pm.HalfCauchy("sigma", beta=2, dims="group")

ad g to an instance of pm.Data, i.e.,

g = pm.Data("g", data.group_idx, dims="observation")

I am wondering about the semantics of sigma[g]. Evidently, __getitem__ has been overridden on pm.HalfCauchy so that indexing into it with a pm.Data instance makes sense (note that the result of the indexing operation is an AdvancedSubtensor.0). But what does such an indexing operation mean in this context, and where might the override be documented? (Please note: I get the impression, from what I have seen in the docs and elsewhere so far, that this pattern of indexing into one PyMC object with another is rare; is my impression correct?).
Thank you very much.

jessegrabowski · May 6, 2022, 8:58am

Theano/Aesara tensors (and by extension PyMC variables) can be indexed exactly as numpy arrays. The numpy documentation on integer-array indexing should answer your question.

The long and short of it is that given some array, you can use the items in that array to create a new array of arbitrary length. In the case of the tutorial, sigma is a tensor of shape (4,), and g is a (100,) array of [0, 0, ..., 1, 1, ..., 2, 2, ... 3, 3, ...3]. Indexing sigma[g] returns a (100,) tensor where the ith element corresponds to sigma[g[i]].

Au contraire, this is a very common way to write down a hierarchical model.

mdrum · May 9, 2022, 3:16am

Thank you for your kind reply and pointers!

Topic		Replies	Views
How indexing works in pymc v5	8	1418	September 13, 2022
Indexing constantdata by label v5	0	122	January 24, 2024
How does PyMC internally treat index variables? version agnostic modeling	5	39	February 19, 2025
Understanding coords, indexation, Data, ..., for multilevel models v5 modeling	1	3824	April 29, 2022
Indexing pm.Deterministic prior to inference	2	64	September 10, 2024

API question: What are the semantics of array-style indexing on a Distribution object, with another PyMC object?

Related topics