Hierarchical multinomial model

I posted this issue also on Stack Overflow, where I uploaded images of posteriors etc.

link here: bayesian - Hierarchical multinomial model in PyMC3 - Stack Overflow

I want to compare the Dirichlet hyper parameters of two sets of multinomial (k=3) observations.

I can fit both separately, and they look fine.

The first set of (N=10) observations were simulated from an alpha vector of [.30 .60 .10], with counts varying around 50

import numpy as np
import pymc3 as pm
import pandas as pd
import theano.tensor as tt

#model 1
N1=10;
c1 = np.asarray([[16,29,4],
                 [16,29,6],
                 [14,30,4],
                 [16,29,3],
                 [16,31,5],
                 [13,29,5],
                 [15,32,5],
                 [15,29,6],
                 [17,31,6],
                 [13,29,4]]);

counts1 = np.asarray([49,51,48,48,52,47,52,50,54,46]);

with pm.Model() as model1:
    
    hyper_param1 = pm.HalfNormal('hyper_param1',10,shape=3);
    param1 = pm.Dirichlet('param1',a=hyper_param1,shape=(N1,3));
    y1 = pm.Multinomial('y1', n=counts1, p=param1, observed=c1);
    trace1 = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);

Posteriors and traces look nice, and the Dirichlet distribution using the mean of the posteriors for each dimension of hyper_param1 centred exactly where it should be.

The second set of (N=11) observations were simulated from an alpha vector of [.30 .10 .60], with counts again varying around 50

#model 2
N2 = 11;
c2 = np.asarray([[15,4,31],
                 [15,3,30],
                 [15,5,31],
                 [16,6,31],
                 [14,5,29],
                 [16,5,30],
                 [15,6,30],
                 [13,6,30],
                 [14,6,31],
                 [16,6,29],
                 [15,7,29]]);

counts2 = np.asarray([50,48,51,53,48,51,51,49,51,51,51]);

with pm.Model() as model2:
    
    hyper_param2 = pm.HalfNormal('hyper_param2',10,shape=3);
    param2 = pm.Dirichlet('param2',a=hyper_param2,shape=(N2,3));
    y2 = pm.Multinomial('y2', n=counts2, p=param2, observed=c2);
    trace2 = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);

Again, posteriors, traces and Dirichlet distribution look fine.

However when I try to combine these into a hierarchical model using indexing, the model barely fits. >100 divergences, the posteriors/traces are a mess and the Dirichlet distributions are fairly uninformative.

## Combined model
c_comb = np.asarray([[16,29,4],
                [16,29,6],
                [14,30,4],
                [16,29,3],
                [16,31,5],
                [13,29,5],
                [15,32,5],
                [15,29,6],
                [17,31,6],
                [13,29,4],
                [15,4,31],
                [15,3,30],
                [15,5,31],
                [16,6,31],
                [14,5,29],
                [16,5,30],
                [15,6,30],
                [13,6,30],
                [14,6,31],
                [16,6,29],
                [15,7,29]]);

counts_comb = np.asarray([49,51,48,48,52,47,52,50,54,46,50,48,51,53,48,51,51,49,51,51,51]);
idx = np.asarray([0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1]).astype(int);

with pm.Model() as model_comb:
    
    hyper_param_comb1 = pm.HalfNormal('hyper_param_comb1',10,shape=3);
    hyper_param_comb2 = pm.HalfNormal('hyper_param_comb2',10,shape=3);

    param_comb1 = pm.Dirichlet('param_comb1',a=hyper_param_comb1,shape=(N1,3));
    param_comb2 = pm.Dirichlet('param_comb2',a=hyper_param_comb2,shape=(N2,3));
    
    param_comb = tt.concatenate((param_comb1,param_comb2),axis=0);
    
    y_comb = pm.Multinomial('y_comb', n=counts_comb, p=param_comb[idx], observed=c_comb);
    trace_comb = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);

I’ve tried increasing the acceptance requirement on the tuning (up to .99), increasing the tuning samples, various priors on the hyper_params (gamma, StudentsT, normal, Uniform (restricted to be >0 and >1)), also tried fitting the exponent of the param vectors in case negative values were somehow emerging and causing mayhem… nothing worked. Either I’m doing something daft with the model setup, or perhaps this is some de-centering/centering issue that needs to be accommodated?

PyMC3 version: 3.8 Python version: 3.8 OS: Linux (PopOS v20.10) IDE: Spyder v4.2.1

I ran your last model using version 3.11.1 of PyMC3 and with target_accept=0.90 got it to finish in ~2 minutes with a single divergence. I was getting a shape error with your counts, so I had to modify that slightly. Here’s the version I ran.

Thanks for that Christopher.

What exactly did you change in the counts?

I ran your code (needed to specify vales for N1=10 and N2=11) and it yields similar problems with the posteriors. param_comb1 should have peaks at [.30 .60 .10], which it does, but also a large posterior covering pretty much 0 to 1. Param_comb2 should have peaks at [.30 .10 .60], but just one large messy posterior covering 0 to 1…

I reshaped the counts from (N,) to (N,1) so that they broadcasted properly in the likelihood function. I apologize that I did not check the values of the posterior parameters themselves.