Problem with pymc3.Multinomial() distribution?

Dear PyMC3 developers:
Not sure what’s going on with pm.Multinomial() distribution, or it is my coding mistake as I’m new to PyMC3. I was trying to fit a Bayesian Hierarchical Model with the Yogurt data - starting with 2 households to 5 households. As the number of households increase from 2 to 5, the draw values from my multivariate alpha random variable getting smaller and smaller. Here alpha dimension is (number of households x 3. I copied the data in the code below up to 5 households for testing. Each response y from a household contains 4 values (4 choices).There are multiple rows of y for each household. Alpha represents the choice preferences from different households, one column less than y’s column because that level serves as reference in coding. Here is the specific code for 5 households case.

import theano.tensor as tt
import pymc3 as pm
import numpy as np

num_house = 5
np.random.seed(123)
mu_mean = np.zeros(3)
mu_cov = np.array([[625, 200, 200],[200,625,200], [200, 200, 625]])

# choices per household index
choices_index = [0]*8 + [1]*20 + [2]*15 +[3]*14 + [4]*14 

# Get y data = (71 x 4) for 5 households
Yoplait = [0]*8 + [1]*4 + [0]*19 +[1]*8 + [0]*13 + [1]*3 + [0]*2 + [1]*3 + [0]*3 + [1]*2 + [0]*5 + [1] 
Yoplait = np.array(Yoplait).reshape(71,1)
Dannon = [0] + [1]*6 + [0]*21 + [1]*3 + [0]*9 + [1]*3 + [0]*2 + [1]*2 + [0]*3 + [1]*2 +[0]*4 + [1]+[0]*3+[1]*3+[0]*2+[1]*5+[0] 
Dannon = np.array(Dannon).reshape(71,1)
Weight = [1] + [0]*6 + [1] + [0]*4 + [1]*16 + [0]*11 + [1] + [0]*15 + [1] + [0]*15
Weight = np.array(Weight).reshape(71,1)
Hiland = [0]*43 + [1]*2 + [0]*2 + [1]*3 + [0]*21
Hiland = np.array(Hiland).reshape(71,1)
y = np.concatenate((Yoplait,Dannon,Weight,Hiland),axis = 1)

if __name__ == '__main__':
    Yogurt_Model = pm.Model()
    with Yogurt_Model:
        
        # Specify prior distribution for population mean of alpha - utility parameters for household
        mu = pm.MvNormal('mu', mu = mu_mean, cov = mu_cov, shape = len(mu_mean))
        
        # prior distribution for covariance of alpha 
        packed_L = pm.LKJCholeskyCov('packed_L', n = 3, eta = 2., sd_dist = pm.HalfCauchy.dist(2.5))
        L = pm.expand_packed_triangular(3,packed_L)
        
        # prior distribution for alpha
        alpha = pm.MvNormal('alpha', mu = mu, chol = L, shape = (num_house,len(mu_mean)))
        
        # Reshape alpha to have repeated alpha rows for each household
        alpha_reshaped = pm.Deterministic('alpha_reshaped', alpha[choices_index])
        
        # add the zero column for the reference level to alpha_reshaped
        zero_vec = np.zeros((len(choices_index),1))
        alpha_reshaped = tt.concatenate([alpha_reshaped,zero_vec], axis = 1)
                                                       
        # use softmax function to calculate p used in multinomial distribution 
        p = pm.Deterministic('p', tt.nnet.softmax(alpha_reshaped))
      
        #Likelihood
        Like = pm.Multinomial('Like', n= 1, p = p, observed = y)
        step = pm.Metropolis()   
        trace = pm.sample(5000, tune = 1000, step = step)  
        
        print(trace['alpha'][2000:5000].mean(axis=0))

The alpha means I got for 5 household are

[[ 1.79400609e-08 -4.60850162e-08 6.50863713e-09]
[ 1.77725171e-08 -1.05121019e-08 2.82981168e-09]
[ 1.81173138e-08 1.78398032e-08 2.53601684e-08]
[ 1.76738471e-08 1.70500670e-08 2.45552148e-08]
[ 1.78436486e-08 -3.60380436e-08 1.99448907e-08]]

Modify a few dimensions in data, rerun the same code with 4, 3, and 2 households, I got the gradually increased alpha means

For 4 households,
[[-0.01718291 -0.00599668 0.03564388]
[-0.01047885 -0.00710468 0.03176734]
[ 0.00345223 -0.00845897 0.02691133]
[-0.0044705 -0.00827159 -0.0268496 ]]

For 3 households,
[[0.90416004 3.72164376 2.18054261]
[2.44950274 0.98309626 4.14634506]
[2.55693314 2.47482759 0.67979624]]

For 2 households,
[[ 2.69389092 6.53143594 5.29746544]
[ 4.18144832 -3.51906643 5.71260768]]

The entire yogurt data has 100 households. If I run the entire data, then alpha values are zeros.
Thank you very much for shedding a light on what’s going on with my example! Truly appreciate your help!

Likely Metropolis is being very inefficient exploring the high dimensional space. Also, since you have n=1, you can use Categorical instead of Multinomial. Small rewrite of the model that seems to work:

with pm.Model() as Yogurt_Model:

    # Specify prior distribution for population mean of alpha - utility parameters for household
    mu = pm.MvNormal('mu', mu=mu_mean, cov=mu_cov, shape=len(mu_mean))

    # prior distribution for covariance of alpha
    packed_L = pm.LKJCholeskyCov(
        'packed_L', n=3, eta=2., sd_dist=pm.HalfNormal.dist(2.5))
    L = pm.expand_packed_triangular(3, packed_L)

    # prior distribution for alpha using noncenter parameterization
    alpha_ = pm.Normal('alpha_', 0., 1., shape=(num_house, len(mu_mean)))
    alpha = alpha_@L + mu
    
    # Reshape alpha to have repeated alpha rows for each household
    alpha_reshaped = pm.Deterministic('alpha_reshaped', alpha[choices_index])

    # add the zero column for the reference level to alpha_reshaped
    zero_vec = np.zeros((len(choices_index), 1))
    alpha_reshaped = tt.concatenate([alpha_reshaped, zero_vec], axis=1)

    # use softmax function to calculate p used in multinomial distribution
    p = pm.Deterministic('p', tt.nnet.softmax(alpha_reshaped))

    # Likelihood
    Like = pm.Categorical('Like', p=p, observed=np.argmax(y, axis=1))
    trace = pm.sample(1000, tune=1000, return_inferencedata=True)

print(trace.posterior['alpha'].mean(axis=1))
<xarray.DataArray 'alpha' (chain: 2, alpha_dim_0: 5, alpha_dim_1: 3)>
array([[[ 0.66873137,  4.33626143,  2.8985855 ],
        [ 3.31531368,  0.12725865,  4.872073  ],
        [ 3.68108074,  3.36640514,  1.36393261],
        [-0.08666305,  0.45458892, -1.08253757],
        [ 4.12301146,  4.43430025,  0.21736631]],

       [[ 0.81179725,  4.16839674,  2.71980007],
        [ 3.73765871,  0.70367868,  5.21265732],
        [ 3.83967391,  3.54716772,  1.53611838],
        [-0.09483954,  0.45383569, -1.14647242],
        [ 4.03947381,  4.38736543,  0.02006402]]])
Coordinates:
  * chain        (chain) int64 0 1
  * alpha_dim_0  (alpha_dim_0) int64 0 1 2 3 4
  * alpha_dim_1  (alpha_dim_1) int64 0 1 2
2 Likes

@junpenglao Thank you very much for suggesting using Categorical! Will give it try for the entire 100 household case. Thank you!

@junpenglao Thank you very much! Your code works for the entire data set which contains 100 households – 100 3-component alpha distributions, each alpha representing an individual household. I also added 2 fixed coefficients (using classical statistics terminology) – coefficient distribution with 2 components - a mixed logit model. Works well. Thanks again!

2 Likes