Custom likelihood function

I’m trying to build a model that is quite different from the pymc3 examples I’ve viewed, and I’m having a hard time figuring out how to fit it into this framework. It’s a classic model used in cognitive science for modeling concept learning. It’s a very simple model, but the likelihood is very non-standard, and I’m not sure the best way to define it here.

Here’s my problem. Imagine there’s a computer program that generates numbers according to some rule that is unknown to you. That rule might be something like “even numbers,” “odd numbers,” or “powers of 2,” and so on. You observe numbers and make predictions about what number the program is going to generate next. The numbers range from 1 to 100 and there are 100 possible rules that the program may be using.

We’ll assume a uniform prior over rules. In pymc3, I’d use a categorical distribution with uniform probability over the 100 categories:

rules_prior = np.ones(100)
rules_prior /= np.sum(rules_prior)
rules = pm.Categorical('rules', rules_prior)

The likelihood of a number given a rule is:

p(n_i | r_j) = \frac{1}{|r_j|}

That is, the likelihood of n_i given r_j is 1 over the number of numbers in the range 1 to 100 that r_j ‘picks out.’

It’s easiest to pre-compute this likelihood and store it in a matrix that can be indexed by number and rule. I could define my custom likelihood function as something that takes in an index and returns the value stored in this matrix. However, this does not work well with pymc3, particularly because my indices are random theano variables that can’t function as indices before they are evaluated, so I get type errors when building the model.

Any suggestions for a better way to create this custom likelihood?

Using a categorical variable as index often doesnt work well, because there is no out of the box algorithm can sample discrete parameter efficiently. The usual treatment in this case is to use a mixture.
I would go with a similar approach as you did, ie “pre-compute this likelihood and store it in a matrix that can be indexed by number and rule”. As there are 100 number * 100 rules this is a 100*100 matrix. Just to clarify: are you trying to model your participants’ mental model? what are the observed data in this case?

When you say to use a mixture – what would be a mixture distribution here? Would I still be using random variables as indexes?

In my current attempt, I’ve written a wrapper function for indexing the matrix, which I feed to pm.DensityDist or pm.Potential.

I am modeling participants’ beliefs about which rule the program is using. The observed data are participants’ predictions for which other numbers the program will generate.

Right, so in this case, you have a latent categorical random variable describing the unobserved participant’s belief. In a mixture model, instead of representing the participant’s belief as a category that you used to index the 100100 matrix, you represent it as a weight vector and scale a slice of this 100100 matrix.