I found some resources that explain design matrices theoretically (video1, video2,wiki), and according to this stackexchange answer there’s a slight difference between the design matrix and the model matrix, (based on Montgomery, D. (2009). Design and Analysis of Experiments, 7th Edition. John Wiley & Sons Inc.).
This notebook makes use of patsy and PyMC3, but doesn’t explain what’s going on there. Another notebook on a sleep deprivation study is a bit more informative, because it shows how to do only random intercepts/slopes or both, but also doesn’t go into much detail about the why and how. It doesn’t provide an interpretation of the results, unfortunately.
A hierarchical model from this thread is using some indexing scheme that I don’t quite grasp. And just to collect some more useful information, here’s a thread that further discusses hierarchical models.
There seems to be a lack of a step-by-step in-depth guide with best practices on HLMs in PyMC3 that’s beginner-friendly. Mostly it’s just notebooks replicating some calculations without much explanation.
Based on the sleep deprivation study notebook I tried applying it to my case.
# Fixed effects. Commonly denoted as X.
X = patsy.dmatrix('1 + block * direction', samples, return_type='dataframe')
X = np.asarray(X)
# Design matrix for the random effects. Intercept and slope are modelled separately to have more control on the prior.
Z_intercept = patsy.dmatrix('0 + user', data=samples, return_type='dataframe')
Z_intercept = np.asarray(Z_intercept)
Z_slope = patsy.dmatrix('0 + user:direction', data=samples, return_type='dataframe')
Z_slope = np.asarray(Z_slope)
Z = np.concatenate((Z_intercept, Z_slope), axis=1)
Y = np.asarray(samples['log_projection_sq'])
with pm.Model(coords=coords) as hlm_model:
## Fixed effect
beta_X_intercept = pm.Normal('beta_X_ortho', mu=0, sd=10) # contrain it to positive values
beta_X_slope_b2 = pm.Normal('beta_X_b2_ortho_offset', mu=0, sd=10)
beta_X_slope_b3 = pm.Normal('beta_X_b3_ortho_offset', mu=0, sd=10)
beta_X_slope_para = pm.Normal('beta_X_parallel_offset', mu=0, sd=10)
beta_X_slope_b2_para = pm.Normal('beta_X_b2_parallel_offset', mu=0, sd=10)
beta_X_slope_b3_para = pm.Normal('beta_X_b3_parallel_offset', mu=0, sd=10)
beta_X = tt.stack([beta_X_intercept,
beta_X_slope_b2,
beta_X_slope_b3,
beta_X_slope_para,
beta_X_slope_b2_para,
beta_X_slope_b3_para
])
estimate_X = pm.math.dot(X, beta_X)
## Random effect
sigma_Z_intercept = pm.HalfNormal('sigma_Z_intercept', sd=10)
gamma_Z_intercept_raw = pm.Normal('gamma_Z_intercept_raw', mu=0, sd=1, shape=Z_intercept.shape[1])
gamma_Z_intercept = pm.Deterministic('gamma_Z_intercept', gamma_Z_intercept_raw * sigma_Z_intercept)
sigma_Z_slope = pm.HalfNormal('sigma_Z_slope', sd=10)
gamma_Z_slope_raw = pm.Normal('gamma_Z_slope_raw', mu=0, sd=1, shape=Z_slope.shape[1])
gamma_Z_slope = pm.Deterministic('gamma_Z_slope', gamma_Z_slope_raw * sigma_Z_slope)
estimate_Z = pm.math.dot(Z_intercept, gamma_Z_intercept) + pm.math.dot(Z_slope, gamma_Z_slope)
## likelihood
mu_estimate = pm.Deterministic('mu_estimate', estimate_X + estimate_Z)
sigma_unexplained = pm.HalfNormal('sigma_unexplained', sd=10) # unexplained variability
y_likelihood = pm.Normal('y_likelihood', mu=mu_estimate, sd=sigma_unexplained, observed=Y)
Sampling was slow and I got all kinds of errors:
There were 19 divergences after tuning. Increase
target_accept
or reparameterize.
There were 55 divergences after tuning. Increasetarget_accept
or reparameterize.
There were 156 divergences after tuning. Increasetarget_accept
or reparameterize.
The acceptance probability does not match the target. It is 0.6974308250411304, but should be close to 0.8. Try to increase the number of tuning steps.
There were 18 divergences after tuning. Increasetarget_accept
or reparameterize.
The number of effective samples is smaller than 10% for some parameters.
I’m also not clear on how to interpret the estimates when using design matrices. Are all the slopes offsets to the orthogonal projection in block 1 (intercept)? Or is ‘beta_X_b2_parallel_offset’, for example, the difference to ‘beta_X_parallel_offset’?
Also in order to stack all the slopes I had to write them down one-by-one, because I couldn’t stack tensors with different amount of columns, e.g. if I had one beta_X_slope with shape 6. If I added condition and more interactions this would get repetitive quite quickly. Is there a vectorized way for this?
Obviously this wasn’t the best attempt. How can it be improved? How can the coords be utilized?