I posted this issue also on Stack Overflow, where I uploaded images of posteriors etc.
I want to compare the Dirichlet hyper parameters of two sets of multinomial (k=3) observations.
I can fit both separately, and they look fine.
The first set of (N=10) observations were simulated from an alpha vector of [.30 .60 .10], with counts varying around 50
import numpy as np import pymc3 as pm import pandas as pd import theano.tensor as tt #model 1 N1=10; c1 = np.asarray([[16,29,4], [16,29,6], [14,30,4], [16,29,3], [16,31,5], [13,29,5], [15,32,5], [15,29,6], [17,31,6], [13,29,4]]); counts1 = np.asarray([49,51,48,48,52,47,52,50,54,46]); with pm.Model() as model1: hyper_param1 = pm.HalfNormal('hyper_param1',10,shape=3); param1 = pm.Dirichlet('param1',a=hyper_param1,shape=(N1,3)); y1 = pm.Multinomial('y1', n=counts1, p=param1, observed=c1); trace1 = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);
Posteriors and traces look nice, and the Dirichlet distribution using the mean of the posteriors for each dimension of hyper_param1 centred exactly where it should be.
The second set of (N=11) observations were simulated from an alpha vector of [.30 .10 .60], with counts again varying around 50
#model 2 N2 = 11; c2 = np.asarray([[15,4,31], [15,3,30], [15,5,31], [16,6,31], [14,5,29], [16,5,30], [15,6,30], [13,6,30], [14,6,31], [16,6,29], [15,7,29]]); counts2 = np.asarray([50,48,51,53,48,51,51,49,51,51,51]); with pm.Model() as model2: hyper_param2 = pm.HalfNormal('hyper_param2',10,shape=3); param2 = pm.Dirichlet('param2',a=hyper_param2,shape=(N2,3)); y2 = pm.Multinomial('y2', n=counts2, p=param2, observed=c2); trace2 = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);
Again, posteriors, traces and Dirichlet distribution look fine.
However when I try to combine these into a hierarchical model using indexing, the model barely fits. >100 divergences, the posteriors/traces are a mess and the Dirichlet distributions are fairly uninformative.
## Combined model c_comb = np.asarray([[16,29,4], [16,29,6], [14,30,4], [16,29,3], [16,31,5], [13,29,5], [15,32,5], [15,29,6], [17,31,6], [13,29,4], [15,4,31], [15,3,30], [15,5,31], [16,6,31], [14,5,29], [16,5,30], [15,6,30], [13,6,30], [14,6,31], [16,6,29], [15,7,29]]); counts_comb = np.asarray([49,51,48,48,52,47,52,50,54,46,50,48,51,53,48,51,51,49,51,51,51]); idx = np.asarray([0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1]).astype(int); with pm.Model() as model_comb: hyper_param_comb1 = pm.HalfNormal('hyper_param_comb1',10,shape=3); hyper_param_comb2 = pm.HalfNormal('hyper_param_comb2',10,shape=3); param_comb1 = pm.Dirichlet('param_comb1',a=hyper_param_comb1,shape=(N1,3)); param_comb2 = pm.Dirichlet('param_comb2',a=hyper_param_comb2,shape=(N2,3)); param_comb = tt.concatenate((param_comb1,param_comb2),axis=0); y_comb = pm.Multinomial('y_comb', n=counts_comb, p=param_comb[idx], observed=c_comb); trace_comb = pm.sample(10000, tune=10000, init='adapt_diag',target_accept=0.95);
I’ve tried increasing the acceptance requirement on the tuning (up to .99), increasing the tuning samples, various priors on the hyper_params (gamma, StudentsT, normal, Uniform (restricted to be >0 and >1)), also tried fitting the exponent of the param vectors in case negative values were somehow emerging and causing mayhem… nothing worked. Either I’m doing something daft with the model setup, or perhaps this is some de-centering/centering issue that needs to be accommodated?
PyMC3 version: 3.8 Python version: 3.8 OS: Linux (PopOS v20.10) IDE: Spyder v4.2.1