Overall, your model is over-specified.
So my approach to these kind of problem is usually the following:
-
Understand the structure of the input. I usually do a pairplot of the predictors. In this case, it is clear that you have predictors that are common among departments, and you have N departments (usually treated as random effect)
-
Start with a simple model that capture the structure of the data.
In your case, since it is a multinomial regression, you need to be careful re normalization in the softmax. I find that the most stable way is to modelk-1
categories and stack a columns of 0s at the end:
# make sure there is no missing data here
groupX = X_full.loc[idx['seinemaritime',:], :].iloc[:, :n_regressors-1].values
Ndpt = len(np.unique(dptmts_idx))
date_idx, date_names = X_full.index.get_level_values(1).factorize(sort=True)
with pm.Model() as hier_model:
dpt_interp = pm.Normal('dpt_interp', 0., 1., shape=(Ndpt, n_parties-1))
fixed_effect = pm.Normal('fixed', 0., 1., shape=(n_regressors-1, n_parties-1))
random_effect = pm.Normal('random', 0., 1., shape=n_parties-1)
results_est = dpt_interp[dptmts_idx] +\
tt.dot(groupX, fixed_effect)[date_idx]+\
X_full['chomage'].values[:, None]*random_effect
results_est = tt.concatenate(tensor_list=[results_est,
tt.zeros((X_full.shape[0], 1))],
axis=1)
probs = pm.Deterministic('probs', tt.nnet.softmax(results_est))
likelihood = pm.Multinomial('likelihood',
n=y_full.sum(axis=1),
p=probs,
observed=y_full.values)
# might need to increase tunning
trace = pm.sample(1000, tune=1000, cores=4, random_seed=RANDOM_SEED)
- add hyperpriors. I dont think it is too useful to put hyperprior on the
mus
of the coefficients (again, given the softmax), but it might help to put prior on the sigmas (even hierarchical ones)