Making a vector of variables from a list

Hi all, I’m working on some hierarchical regressions but I have a few groups which only have one member. I’m considering different variances for each group, but if a group only has one member, it doesn’t have a variance.

To generate the individual level variables, I loop over each individual and define the variable with an if statement depending on whether the group has one or multiple members. Full code is attached for an example with made up data, and here’s a snippet with the model.

with pm.Model() as model:
    #Baseline intercept
    a0   = pm.Normal('baseline',mu=0.,sd=5.)

    #group level predictors
    a_group       = hierarchical_normal('group',5)
    mu_individual = a_group[groups]
    
    sigma_indiv_upp = pm.HalfNormal('sigma_individual_upper', sd=1.)
    sigma_indiv     = pm.HalfNormal('sigma_individual',sd=sigma_indiv_upp,shape=4)
    a_individual = []
    for i in range(nIndiv):
        if i in skip:
            a_individual.append(pm.Deterministic('individual_{}'.format(i), mu_individual[i]))
        else:
            delta = pm.Normal('delta_individual_{}'.format(i), 0., 1.)
            a_individual.append(pm.Deterministic('individual_{}'.format(i), mu_individual[i] + delta * sigma_indiv[groups[i]]))
    a_individual = tt.as_tensor_variable(a_individual)
    

    eta        = a0 + a_individual[indivIdx]
    p          = pm.math.sigmoid(eta)
    likelihood = pm.Binomial("obs",N,p,observed=obs)

This seems to work fine, but then the individual coefficients don’t get treated like vectors as they would if I had generated them with the shape argument set. So when I do a traceplot for example, I get separate subplots for each of the individual coefficients rather than having all of them on one subplot.

Is there a better way to define the model, or if not, is there a way to make the individual coefficients behave like a vector for plotting purposes? Thanks

toyExample.py (2.9 KB)

You can try something like below:

skip = np.ones((nIndiv,), dtype=int)
skip[nIndiv-1] = 0
with pm.Model() as model:
    ...
    sigma_indiv     = pm.HalfNormal('sigma_individual', sd=sigma_indiv_upp, shape=5)
    delta = pm.Normal('delta_individual', 0., 1.)
    a_individual  = pm.Deterministic('individual_eff', a_group[groups] + delta * sigma_indiv[groups] * skip)# so that the skip one just adding a zero.

Thanks for taking a look! I thought about something like this, but with this method there is a prior specified for sigma_individual_4 and for delta_individual_16, when I think these variables should be set to zero deterministically. I’m not sure if this actually matters though.

Doing so you will get samples of these two variables following the prior distribution. Since they have nothing to do with the model logp (besides adding some additional constant), they have no effect to NUTS or MH sampler.
You can also following the for loop like you did, but create a deterministic using tt.stack. I guess it depends on how many skips in your real case. If there are lots of subject with only one measurements (ie, you are skipping a lot), using a for loop then tt.stack might be more efficient.

Ok got it!

The real example is estimating opinions in congressional districts from a large national survey called the cooperative congressional elections survey. So all the groups with one member are states with only one congressional district. There are only 8 states (counting wahsington DC) with only one district so it’s probably not that big of a deal efficiency-wise.

It works great for the full example too. Short summary and source code linked below, thanks again

https://www.unburythelead.com/blog/2017/12/27/better-inferences-at-the-congressional-district-level

1 Like