Question about Discrete Mixture

I am trying to emulate the dice problem from Allen Downer’s Think Bayes book in PyMC. The idea is that you have dies of different size (4,6,8,12,20) and you pick one at random and roll it. The question is what can you infer about the picked die given an outcome (e.g., observing a 6).

I tried to model this using a Mixture of Discrete Uniform variables, having a Dirichlet prior for the weights. I would expect that the posterior weight for the first die would be zero, but this is not the case. This model is probably not appropriate for the problem, but I am not sure why. Any ideas?

dice_sizes = np.array((4, 6, 8, 12, 20))

with pm.Model() as m:
    dice = pm.DiscreteUniform.dist(lower=[1], upper=dice_sizes)
    weights = pm.Dirichlet('weights', a=np.ones(len(dice_sizes)))
    roll = pm.Mixture('roll', comp_dists=dice, w=weights, observed=6, dtype='int64')

    trace_m = pm.sample(1000)


-> array([0.16973499, 0.23528162, 0.21347763, 0.18884671, 0.19265905])

Thank you in advance!


Interesting, the problem here is that in Mixture class, the comp_dist is outputting a log_prob of -inf for dice 1 correctly, but the mixture log_prob use a pm.math.logsumexp that making none zero weight still valid (i.e., does not get rejected).

One of the solution is to replicate the generation process and use only discrete variables:

import numpy as np
import pymc3 as pm
import theano

dice_sizes = np.array((4, 6, 8, 12, 20))
p_dice = np.ones(len(dice_sizes)) / len(dice_sizes)
p_roll = []
for i in dice_sizes:
    p_temp = np.ones(max(dice_sizes)) / i
    p_temp[i:] = 0.
p_roll = theano.shared(np.asarray(p_roll))

observed = 6
with pm.Model() as m:
    dice = pm.Categorical('dice', p=p_dice, testval=4)
    roll = pm.Categorical('roll', p=p_roll[dice], observed=observed-1)
    trace_m = pm.sample(10000)

np.histogram(trace_m['dice'], np.arange(len(dice_sizes)+1), density=True)
# ==> (array([0.     , 0.38515, 0.2955 , 0.2026 , 0.11675]),
#      array([0, 1, 2, 3, 4, 5]))

which seems to work well also for the follow up question in the book by replace observed with observed = np.array([6, 8, 7, 7, 5, 4]).

1 Like

Thanks for looking it up. I ended up doing something similar to your suggestion:

dice_sizes = np.array((4, 6, 8, 12, 20))

with pm.Model() as m_3_1:
        obs = pm.Data('data', [6])
        dice = pm.Categorical('dice', p=np.ones(len(dice_sizes)), testval=2)
        dice_size = theano.shared(dice_sizes)[dice]
        roll = pm.DiscreteUniform('roll', lower=1, upper=dice_size, observed=obs)

I just feel it would have been more clean to use the Mixture object. Is this a bug or should it be considered standard behavior?

Well you can use a mixture with a Dirichlet prior that concentrated the weight to 0 or 1:

observed = 6
valid_dice = observed <= dice_sizes
weights_prior = np.ones(sum(valid_dice))/10.

with pm.Model() as m:
    dice = pm.DiscreteUniform.dist(lower=[1], upper=dice_sizes[valid_dice])
    weights = pm.Dirichlet('weights', a=weights_prior)
    roll = pm.Mixture('roll', w=weights, comp_dists=dice, observed=observed)

    trace_m = pm.sample(1000, tune=10000)
1 Like