Help! DM model not behaving as expected when sampling

I have created a model of bags of skittles largely based on the documentation for the Dirichlet Multinomial:


seeds = [1616, 3232, 6464, 1985]
colours = ["red", "orange", "yellow", "green", "purple"]
bags = ["skittles"]
coords = {"colour": colours, "bag": bags}

k = len(colours)
n = len(bags)

observed_by_size = {}
simulated_observed = [[int(3 + np.random.rand()*14) for i in range(5)] for j in range(1000)]

for sample in simulated_observed: 
    n = sum(sample)
    if n in observed_by_size:
        observed_by_size[n] = [sample]
skittles_model = pm.Model(coords=coords)

with skittles_model:
    num_skittles_prior = pm.Gamma("num_skittles_prior", alpha=45, beta=1)
    num_skittles = pm.Poisson("num_skittles", mu=num_skittles_prior)
    frac = pm.Dirichlet("frac", a=np.ones(k), dims="colour")
    conc = pm.Lognormal("conc", mu=1, sigma=1)
    bag_of_skittles = pm.DirichletMultinomial(
        a=frac * conc, 
        observed=observed_by_size.get(num_skittles, None), 
        dims=("bag", "colour")

But when I sample from this model:

with skittles_model:
    idata = pm.sample(draws=1000, cores=4, random_seed=seeds)

I do not get the expected result in two ways. First, the num_skittles variable never varies. It stays stuck on 44. Second, all four chains produce identical results despite passing in four different numbers for the random seed. When I sample from just the prior it behaves as expected.

Can anyone explain to me where I’ve gone wrong here?

num_skittles is completely determined by the observation, only the value 44 is valid.

Even without observing the variable, it’s not currently possible to do MCMC sampling of the n parameter of a Multinomial or DirichletMultinomial, because PyMC samples any two discrete variables one at a time (in a step fashion) and the only valid value of n is the one that agrees with the current sum of the values of the respective Multinomial/DM.

A specialized sampler would be needed to sample the two together. This is still irrelevant if the distribution is observed as in your example, because there’s no uncertainty about n then.

Thanks for your reply! If it’s a limitation of the sampler then I’ll have to accept that. I suppose the strategy would be to split it into two separate models; one for the number of skittles in a bag and another for the fractions of the colours.

It’s not clear to me why you state that 44 is the only valid value for this variable? In the simulated observations the sum of each sample varies widely.

I’m also still confuse as to why the chains produce identical traces. Fair enough the num_skittles value seems to be limited by the sampler but the actual counts from the multinomial are also identical in each chain.

When you set the observation, you are fixing the value to 44. Prior predictive ignores the observations (hence why it’s a prior predictive)

I’m sorry, I’m still not following? Where do I set the observation to 44?

The only place I can see the number 44 coming from is that it’s the mode of the gamma with a=45 and b=1? Wouldn’t I expect to see a different number with different random seeds?

Is it possible that num_skittles is just being stuck at the first value that Gamma prior forces it to be? For instance, if you do

num_skittles = pm.Poisson("num_skittles", mu=num_skittles_prior, initval=14)

you get a constant value of 14 for the posterior even when gamma alpha is 45. Indeed looking at the model I do not see why one value of num_skittles would be more preferable to the other. Infact I don’t even quite understand why you would want to estimate fractions of each color when in simulated data they are coming from completely random uniform distributions each time and not from some ground truth set of real fractions.

Nevertheless, I understand that you have a batch of observables where the total number of draws are different and you want to use these as observables. Maybe one way is to write a seperate likelihood for each total number of counts (number of repeats will be the batch size for the distr). If there are too many different total number of counts this might be unfeasible. Another way is probably along the lines of what Ricardo has suggested, to write a custom sampler for num_skittles. It does not get updated but it is just drawn randomly in each step. The anologous code for normal would be

class NormalNoiseStep(BlockedStep):
    def __init__(self, var, mean, sd, record, size):
        model = pm.modelcontext(None)
        value_var = model.rvs_to_values[var]
        self.vars = [value_var] =
        self.mean = mean
        self.size = size = sd
        self.record = record

    def step(self, point: dict):
        draw = rng.normal(self.mean,, size=self.size)
        point[] = draw
        return point,[]

where inside the model you would initiate it as

mean_c = 5
sd_cd = 1
with pm.model() as Model:
  c = pm.Normal("c", mean_c ,sd_cd, size=1)
  steps = [NormalNoiseStep(c, mean_c, sd_cd, record1, 1)]
 #do stuff with c
 idata = pm.sample(1000, step=steps)

There is no risk of it getting stuck here. But note again, you are not trying to estimate here anything related to number_of_skittles, you are just adding that to the system as a random source of variability (which might be more inline with Approximate Bayesian Computation than classical Bayesian framework actually).

If you really want number_of_skittles to be a parameter to be estimated, you probably need to add some likelihood to the system that relates it to the distribution of observed total counts in your simulated data. Not %100 sure what would be the most reasonable way to do it.

No need to apologize. I said 44 because you said that’s the number you got stuck at.
Yes it comes from the mean of the gamma that is used to decide where to start sampling. You can probably find it in model.initial_point()

The important thing is that only one value of n is possible after you define the observations. That initial point either agrees with the observations or not. I assume 44 did, otherwise sampling wouldn’t even start.

import pymc as pm

with pm.Model() as m:
    n = pm.Poisson("n", 10)
    p = pm.Dirichlet("p", [1, 1, 1, 1, 1])
    # Once I condition on observations, the only valid value for `n` is 5
    y = pm.Multinomial("y", n=n, p=p, observed=[1, 0, 2, 1, 1])
ip = m.initial_point()
# {'n': array(10), 'p_simplex__': array([0., 0., 0., 0.])}
logp = m.compile_logp()
for n in range(10):
    ip["n"] = n
    print(n, logp(ip))
# 0 -inf
# 1 -inf
# 2 -inf
# 3 -inf
# 4 -inf
# 5 -10.487109097148673
# 6 -inf
# 7 -inf
# 8 -inf
# 9 -inf

In prior and posterior predictive the observations don’t play a role, so both n and the y are allowed to change.

I don’t entirely understand the intent of the model here. The observed data is, in part, determined by a random variable (num_skittles)? Why generate all that data if you aren’t going to use it?

Thanks your response too! Yes obviously for the simulated data it seems pointless but the idea would be to gather and plug in real data! The posterior would hopefully reflect whatever mechanisms are used in manufacturing to prevent a given packet being illegally small, or wastefully large.

But in the real world, you data would be one set of known observations, not a random subset of some larger set of observations. No?

Ok I understand better now, and helpful from the above poster pointing out it reflects the the initval. I had assumed the initial value would be a random draw from the prior.

As you can see from my code what I attempted to do was convert the observations to dictionary of observations and feed the model only the observations that agree with the current value of num_skittles at that given state. Clearly in practice this hasn’t worked!

Conceptually what I think my model design stated is that the dirichlet has hyperparameters that are independent of num_skittles and avoid the need to have a separate DM for each different possible value of n.

It seems to me now that the most sensible option is to separate this out into two models now.

I’m not sure I understand your confusion; If you bought a million bags of skittles and counted each one then you would have a distribution of the number of skittles possible in a bag and what the colour distribution is? I thought I could have it all in one model but now see it’s probably better to learn the distributions separately. Of course the dispersion might reveal that there are different fractions for different bag sizes but my working assumption is that it isn’t the case.

You can have it all in one model, but the observed data would be something like this:

bag 1

  • 5 yellow
  • 6 red
  • 3 green
  • (14 total)

bag 2

  • 1 yellow
  • 3 red
  • 7 green
  • (11 total)

bag 3

  • 4 yellow
  • 4 red
  • 2 green
  • (10 total)

The observed data wouldn’t be some random sampling of the bags or of the colors/counts and also wouldn’t be one specific bag. It would be the entire set of bags and their contents.

Oh yes I see what you’re meaning now. It was me trying to hack around the fact that at any given point with a fixed n the likelihood of most data points would be zero. I’d be interested to hear what you think the model should look like in this case?

Maybe something like this? Warning, I threw this together quickly and may have made very dumb mistakes.

import pymc as pm
import numpy as np

colours = ["red", "orange", "yellow", "green", "purple"]
data = np.array([[3, 2, 4, 23, 4],  # bag 1
                 [22, 1, 6, 2, 8]]) # bag 2
bags = list(range(data.shape[0]))
coords = {"colour": colours, "bag": bags}

k = len(colours)
n = len(bags)

with pm.Model(coords=coords) as skittles_model:

    frac = pm.Dirichlet("frac", a=np.ones(k), dims="colour")
    conc = pm.Lognormal("conc", mu=1, sigma=1)
    bag_of_skittles = pm.DirichletMultinomial(
        a=frac * conc, 
        dims=("bag", "colour")
    idata = pm.sample()