Multinomial hierarchical regression with multiple observations per group ("Bad energy issue")

Overall, your model is over-specified.

So my approach to these kind of problem is usually the following:

  1. Understand the structure of the input. I usually do a pairplot of the predictors. In this case, it is clear that you have predictors that are common among departments, and you have N departments (usually treated as random effect)

  2. Start with a simple model that capture the structure of the data.
    In your case, since it is a multinomial regression, you need to be careful re normalization in the softmax. I find that the most stable way is to model k-1 categories and stack a columns of 0s at the end:

# make sure there is no missing data here
groupX = X_full.loc[idx['seinemaritime',:], :].iloc[:, :n_regressors-1].values

Ndpt = len(np.unique(dptmts_idx))
date_idx, date_names = X_full.index.get_level_values(1).factorize(sort=True)

with pm.Model() as hier_model:
    dpt_interp = pm.Normal('dpt_interp', 0., 1., shape=(Ndpt, n_parties-1))
    fixed_effect = pm.Normal('fixed', 0., 1., shape=(n_regressors-1, n_parties-1))
    random_effect = pm.Normal('random', 0., 1., shape=n_parties-1)
    
    results_est = dpt_interp[dptmts_idx] +\
                tt.dot(groupX, fixed_effect)[date_idx]+\
                X_full['chomage'].values[:, None]*random_effect
    results_est = tt.concatenate(tensor_list=[results_est, 
                                              tt.zeros((X_full.shape[0], 1))],
                                 axis=1)
    
    probs = pm.Deterministic('probs', tt.nnet.softmax(results_est))
    likelihood = pm.Multinomial('likelihood',
                                n=y_full.sum(axis=1),
                                p=probs,
                                observed=y_full.values)
    # might need to increase tunning
    trace = pm.sample(1000, tune=1000, cores=4, random_seed=RANDOM_SEED)
  1. add hyperpriors. I dont think it is too useful to put hyperprior on the mus of the coefficients (again, given the softmax), but it might help to put prior on the sigmas (even hierarchical ones)
1 Like