Panel model fitting issue

I have panel data from 1251 people over 4-time points and want to construct a partially pooled (random effects) panel model. I use lagged regressors and therefore only can use 3-time points; so T=3, total_N =3753, with 1251 unique subjects (each with three observations for the regressors and D.V.). My df is in long format, with both time-variant and invariant regressors.

This is a similar issue to a prior post titled Multinomial with Random Effects on panel data, and thus I try to use practically the same model for my example (Y can be 0,1,2). However, I am having difficulty integrating the hierarchical temporal component into my model as recommended in that post’s proposed solution. I have drawn heavily from the classic radon example, but still am stuck.

My question: how to account for the RE across different time periods AND the multiple observations per person? I want to estimate a single beta per regressor (thus am not trying to estimate random slopes).

My model below does run, however, there are many divergences, and it uses time_idx and not respondents_idx, thus I think it does not properly account for the multi-responses per person nature of my data.

Any suggestions would be greatly welcome.

with pm.Model() as panel_model:
s = pm.HalfStudentT(‘sd_1’, nu=3, sd=186)
b = pm.Normal(‘b’, mu = 0, sd = 1, shape=3)
common_mu = pm.Normal(‘common_mu’, mu=0., sd=2)
common_sd = pm.Exponential(“common_sd”, 1.0)

#could also use dot product, but I like the visual
b1 = pm.Normal(“b1”, mu=common_mu, sigma=common_sd)
b2 = pm.Normal(“b2”, mu=common_mu, sigma=common_sd)
b3 = pm.Normal(“b3”, mu=common_mu, sigma=common_sd)
b4 = pm.Normal(“b4”, mu=common_mu, sigma=common_sd)

mu = (b1* data['X1'] + b2* data['X2'] + b3* data['X3'] + b4* data['X4'])

r_1 = pm.Deterministic('r_1', s*b)

#link func                             #T=3    
p = T.nnet.softmax(mu + r_1[time_idx])
                                                    #Y can be 0,1,2
y = pm.Categorical('y', p=p, observed=data['Y'].values)

If I understand your problem correctly, you need to use a numpy-style fancy slice on your r_1 variable. I set up some data to match your problem like this:

T = 3
N = 1251
k = 4

time_index = np.arange(T)
id_index = np.arange(N)

index = pd.MultiIndex.from_product([id_index, time_index])
data = pd.DataFrame(np.random.normal(size=(T * N, 4)), index=index, columns=[f'X{i+1}' for i in range(k)])
data['Y'] = np.random.normal(size=(T * N))

Then I turn the time dimension of the dataframe’s index into a numerical index using pd.factorize:

time_idxs, times = pd.factorize(data.index.get_level_values(1))

time_idxs is just the integers [0,1,2] repeated N times. If you slice a random variable with it, it will put the 0th value everywhere there is a zero, the 1st value everywhere there is a 1, and the 2nd value everywhere there is a 2. So slicing:

r_1[time_idxs] will give r_1[0], r_1[1], r_1[2], r_1[0], …

This will broadcast the same 3 time values to each individual. You could also use np.tile(np.arange(3), N) if you don’t like to use pandas.


Hi Jesse,

Yes, thank you very much for replying - that is exactly what I needed!

  • To future people dealing with the same issue, look at the way that the long-format df that Jesse made has two indexes one for people (in this example) and one for time. My initial df had only one and I think that was the source of my error when I was trying to factorize it.

My final model looks like

with pm.Model() as panel_model:
common_mu = pm.Normal(‘common_mu’, mu=0., sd=2)
common_sd = pm.Exponential(“common_sd”, 1.0)
s = pm.Exponential(“s”, 1.0)
b = pm.Normal(‘b’, mu = 0, sd = 1, shape=len(set(time_idxs)))
beta = pm.Normal(“beta”, mu=common_mu, sigma=common_sd, shape=(4))

mu =[['X1', 'X2', 'X3', 'X4']].values, beta)

r_1 = pm.Deterministic('r_1', s*b)

#link func        
p = T.nnet.softmax(mu + r_1[[time_idxs]])
y = pm.Categorical('y', p=p, observed=data['Y'].values)

Thanks again for the help!

1 Like