(This is approximately a mirror of my question on stackexchange. I’m using PyMC3 for inference, and I thought maybe this community would have a higher density of experienced opinions )
I’m trying to develop a replacement for a Chi-Sq test that uses Bayesian parameter estimation in a hierarchical model instead of a frequentist point estimate (analogous to replacing a t-test with BEST).
The original problem involves an experiment with two groups of participants, each of whom performs a repeated measures task on which three types of error can be made. I want to estimate the between-group difference in the distribution of error types.
The model structure I’m using:
\mu_{population, i} \sim \mathcal{N}(0, 100)
\sigma_{population, i} \sim \mathcal{HalfCauchy}(5)
\mu_{group, i} \sim \mathcal{N}(\mu_{population, i}, \sigma_{population, i})
\sigma_{group, i} \sim \mathcal{HalfCauchy}(5)
\mu_{subject, i} \sim \mathcal{N}(\mu_{group, i}, \sigma_{group, i})
\theta_{subject, i} = \frac {e^{\mu_{subject, i}}} {\sum_{\hat{i}} e^{\mu_{subject, \hat{i}}}}
k_{subject, i} = \mathcal{Multi}(\vec{\theta}_{subject}, n_{subject})
Two comments on the choices I’ve made; would love to hear any further criticisms:
-
Groups and subjects are technically hierarchical in this experiment, which is an assumption that’s not possible to encode for a standard \chi^2 test as far as I know, but all subjects come from the same originating pool, so it makes sense to have the population-level prior.
-
It’s difficult to work with priors over Dirichlet parameters, so instead of using the Dirichlet conjugate prior for multinomial distributions, I’ve followed a suggestion I’ve seen from Gelman and others to use a softmax operation on normally distributed variables to produce the multinomial parameter.
A. Interestingly, when the per-group parameters are normally distributed, the sum of their rescaled squared differences is \chi^2 distributed, which makes a very natural contrast to sample the posterior over, as it mirrors the procedure and assumptions involved in the frequentist \chi^2 test: \sum_j { \frac {(\mu_{0, j} - \mu_{1, j})^2} {\sigma_{0, j}^2 + \sigma_{1, j}^2} }
So my intent is to sample the posterior over this contrast. This gets me to question 1: have I made appropriate choices in this procedure?
The second question is more directly related to PyMC3 and HMC. When sampling from this, model, I run into convergence issues, even using reparameterization trick for the normal distributions. I’m using multinomial distribution as the final output likelihood; would using a categorical distribution instead help with the convergence if the number of participants is relatively small (<50)?
Model code:
with pm.Model() as chisq_model:
mu_population = pm.Normal('mu_population',
mu=0, sd=100,
shape=num_types)
sigma_population = pm.HalfCauchy('sigma_population',
5,
shape=num_types)
mu_group_offset = pm.Normal('mu_group_offset',
mu=0, sd=1,
shape=(num_groups, num_types))
mu_group = pm.Deterministic('mu_group',
mu_population + sigma_population * mu_group_offset)
sigma_group = pm.HalfCauchy('sigma_group',
5,
shape=(num_groups, num_types))
mu_subject_offset = pm.Normal('mu_subject_offset',
mu=0, sd=1,
shape=(num_subjects, num_types))
mu_subject = pm.Deterministic('mu_subject',
mu_group[groups] + mu_subject_offset * sigma_group[groups])
theta_subject = pm.Deterministic('theta_subject',
T.tensor.exp(mu_subject) / T.tensor.exp(mu_subject).sum(axis=1, keepdims=True))
k_subject = pm.Multinomial('k_subject',
n=ns, p=theta_subject[subjects],
observed=y)
difference_in_means = pm.Deterministic('difference in means',
((mu_group[0] - mu_group[1]) ** 2).sum())
effect_size = pm.Deterministic('effect size',
(((mu_group[0] - mu_group[1]) ** 2)
/ (sigma_group[0] ** 2 + sigma_group[1] ** 2)).sum())