Array selection operations

In my hierarchical modeling, I have found that I often want to select some parameters for a particular output based on values of predictors.

For example, cell cultures grown at a high temperature get one parameter, versus those grown at a low temperature, which get another.

I copied this approach from the method used by @junpenglao in the schizophrenia example case.

It occurs to me that this is probably a common thing to do using a probabilistic programming language, so it might be worth having a kind of expression in PyMC that supports it directly.

Right now, when I do this, I end up with cumbersome expressions I make by vectorizing index to do look up and then extracting a scalar random variable from a vector-shaped random variable. In one case, even worse, I had to hand build a theano vector out of some random variables and some constants. What I write is hard to read, error-prone, and a bear to debug.

Is there a cleaner way of doing this that I just don’t know? Or would it be worth adding some kind of construct to the language to support it?

Also, are there other examples of this than the schizophrenia one?

1 Like

You mean this one right? https://github.com/junpenglao/PrecisionWorkshop1_Prep/blob/master/notebooks/3g.%20Case%20study_supplementary.ipynb

I am not sure where the vectoring index comes in, as if you are using marginalization (which is the recommended approach) you dont need to index to the latent states.

As for having a helper function to do automatic marginalization, I think the easiest way to do so is using pm.Potential.

The difference in my case is that the “selector” (your Z_{i, j}) is a constant, and not a random variable.

Therefore, for each observed data point, the model deterministically chooses one or another factor – so there’s \tau_1 and \tau_2

When I tried to use that in my model as x \tau_1 + (1 - x) \tau_2 for x \in {0, 1}) this gives me very bad results because I am diluting 50% posterior samples of the tau’s with 50% prior sample, since each value of x makes one of the tau’s independent of the sample.

With the vectoring index approach, there is a one-to-one match between samples from tau’s and observations.

It’s possible I am just doing this in a bad way, though. I suppose I could try to partition the observations and have a model with multiple explicit submodels that simply share hyperparameters.

I will try to make a simple model in notebook, but I am so busy making this model work, that other tasks keep getting crowded out.

Here’s a simple example notebook. Note that it also shows that display of my Deterministic, which is intended to extract one of the the two different sigmas for easy reference, seems to be broken.

Thanks for any suggestions!

Oh, if that’s the case you should not do any marginalization, indexing to it is probably the only way to model it.

This is a great idea if the observation in different categories contains similar number of observations, and you dont have too many categories.

1 Like