(This is approximately a mirror of my question on stackexchange. I’m using PyMC3 for inference, and I thought maybe this community would have a higher density of experienced opinions )
I’m trying to develop a replacement for a ChiSq test that uses Bayesian parameter estimation in a hierarchical model instead of a frequentist point estimate (analogous to replacing a ttest with BEST).
The original problem involves an experiment with two groups of participants, each of whom performs a repeated measures task on which three types of error can be made. I want to estimate the betweengroup difference in the distribution of error types.
The model structure I’m using:
\mu_{population, i} \sim \mathcal{N}(0, 100)
\sigma_{population, i} \sim \mathcal{HalfCauchy}(5)
\mu_{group, i} \sim \mathcal{N}(\mu_{population, i}, \sigma_{population, i})
\sigma_{group, i} \sim \mathcal{HalfCauchy}(5)
\mu_{subject, i} \sim \mathcal{N}(\mu_{group, i}, \sigma_{group, i})
\theta_{subject, i} = \frac {e^{\mu_{subject, i}}} {\sum_{\hat{i}} e^{\mu_{subject, \hat{i}}}}
k_{subject, i} = \mathcal{Multi}(\vec{\theta}_{subject}, n_{subject})
Two comments on the choices I’ve made; would love to hear any further criticisms:

Groups and subjects are technically hierarchical in this experiment, which is an assumption that’s not possible to encode for a standard \chi^2 test as far as I know, but all subjects come from the same originating pool, so it makes sense to have the populationlevel prior.

It’s difficult to work with priors over Dirichlet parameters, so instead of using the Dirichlet conjugate prior for multinomial distributions, I’ve followed a suggestion I’ve seen from Gelman and others to use a softmax operation on normally distributed variables to produce the multinomial parameter.
A. Interestingly, when the pergroup parameters are normally distributed, the sum of their rescaled squared differences is \chi^2 distributed, which makes a very natural contrast to sample the posterior over, as it mirrors the procedure and assumptions involved in the frequentist \chi^2 test: \sum_j { \frac {(\mu_{0, j}  \mu_{1, j})^2} {\sigma_{0, j}^2 + \sigma_{1, j}^2} }
So my intent is to sample the posterior over this contrast. This gets me to question 1: have I made appropriate choices in this procedure?
The second question is more directly related to PyMC3 and HMC. When sampling from this, model, I run into convergence issues, even using reparameterization trick for the normal distributions. I’m using multinomial distribution as the final output likelihood; would using a categorical distribution instead help with the convergence if the number of participants is relatively small (<50)?
Model code:
with pm.Model() as chisq_model:
mu_population = pm.Normal('mu_population',
mu=0, sd=100,
shape=num_types)
sigma_population = pm.HalfCauchy('sigma_population',
5,
shape=num_types)
mu_group_offset = pm.Normal('mu_group_offset',
mu=0, sd=1,
shape=(num_groups, num_types))
mu_group = pm.Deterministic('mu_group',
mu_population + sigma_population * mu_group_offset)
sigma_group = pm.HalfCauchy('sigma_group',
5,
shape=(num_groups, num_types))
mu_subject_offset = pm.Normal('mu_subject_offset',
mu=0, sd=1,
shape=(num_subjects, num_types))
mu_subject = pm.Deterministic('mu_subject',
mu_group[groups] + mu_subject_offset * sigma_group[groups])
theta_subject = pm.Deterministic('theta_subject',
T.tensor.exp(mu_subject) / T.tensor.exp(mu_subject).sum(axis=1, keepdims=True))
k_subject = pm.Multinomial('k_subject',
n=ns, p=theta_subject[subjects],
observed=y)
difference_in_means = pm.Deterministic('difference in means',
((mu_group[0]  mu_group[1]) ** 2).sum())
effect_size = pm.Deterministic('effect size',
(((mu_group[0]  mu_group[1]) ** 2)
/ (sigma_group[0] ** 2 + sigma_group[1] ** 2)).sum())