Model Comparison: How to constrain a model to positive mu-differences?

Olaf · September 26, 2020, 2:21am

I found some resources that explain design matrices theoretically (video1, video2,wiki), and according to this stackexchange answer there’s a slight difference between the design matrix and the model matrix, (based on Montgomery, D. (2009). Design and Analysis of Experiments, 7th Edition. John Wiley & Sons Inc.).

This notebook makes use of patsy and PyMC3, but doesn’t explain what’s going on there. Another notebook on a sleep deprivation study is a bit more informative, because it shows how to do only random intercepts/slopes or both, but also doesn’t go into much detail about the why and how. It doesn’t provide an interpretation of the results, unfortunately.
A hierarchical model from this thread is using some indexing scheme that I don’t quite grasp. And just to collect some more useful information, here’s a thread that further discusses hierarchical models.
There seems to be a lack of a step-by-step in-depth guide with best practices on HLMs in PyMC3 that’s beginner-friendly. Mostly it’s just notebooks replicating some calculations without much explanation.

Based on the sleep deprivation study notebook I tried applying it to my case.

# Fixed effects. Commonly denoted as X.
X = patsy.dmatrix('1 + block * direction', samples, return_type='dataframe')
X = np.asarray(X)

# Design matrix for the random effects. Intercept and slope are modelled separately to have more control on the prior.
Z_intercept = patsy.dmatrix('0 + user', data=samples, return_type='dataframe')
Z_intercept = np.asarray(Z_intercept)
Z_slope = patsy.dmatrix('0 + user:direction', data=samples, return_type='dataframe')
Z_slope = np.asarray(Z_slope)
Z = np.concatenate((Z_intercept, Z_slope), axis=1)
Y = np.asarray(samples['log_projection_sq'])

with pm.Model(coords=coords) as hlm_model:
    ## Fixed effect
    beta_X_intercept = pm.Normal('beta_X_ortho', mu=0, sd=10) # contrain it to positive values
    beta_X_slope_b2 = pm.Normal('beta_X_b2_ortho_offset', mu=0, sd=10)
    beta_X_slope_b3 = pm.Normal('beta_X_b3_ortho_offset', mu=0, sd=10)
    beta_X_slope_para = pm.Normal('beta_X_parallel_offset', mu=0, sd=10)
    beta_X_slope_b2_para = pm.Normal('beta_X_b2_parallel_offset', mu=0, sd=10)
    beta_X_slope_b3_para = pm.Normal('beta_X_b3_parallel_offset', mu=0, sd=10)
    beta_X = tt.stack([beta_X_intercept,
                       beta_X_slope_b2,
                       beta_X_slope_b3,
                       beta_X_slope_para,
                       beta_X_slope_b2_para,
                       beta_X_slope_b3_para
                      ])
    
    estimate_X = pm.math.dot(X, beta_X)
    
    ## Random effect
    sigma_Z_intercept = pm.HalfNormal('sigma_Z_intercept', sd=10)
    gamma_Z_intercept_raw = pm.Normal('gamma_Z_intercept_raw', mu=0, sd=1, shape=Z_intercept.shape[1])
    gamma_Z_intercept = pm.Deterministic('gamma_Z_intercept', gamma_Z_intercept_raw * sigma_Z_intercept)

    sigma_Z_slope = pm.HalfNormal('sigma_Z_slope', sd=10)
    gamma_Z_slope_raw = pm.Normal('gamma_Z_slope_raw', mu=0, sd=1, shape=Z_slope.shape[1])
    gamma_Z_slope = pm.Deterministic('gamma_Z_slope', gamma_Z_slope_raw * sigma_Z_slope)
    
    estimate_Z = pm.math.dot(Z_intercept, gamma_Z_intercept) + pm.math.dot(Z_slope, gamma_Z_slope) 
    
    ## likelihood
    mu_estimate = pm.Deterministic('mu_estimate', estimate_X + estimate_Z)
    sigma_unexplained = pm.HalfNormal('sigma_unexplained', sd=10) # unexplained variability
    y_likelihood = pm.Normal('y_likelihood', mu=mu_estimate, sd=sigma_unexplained, observed=Y)

Sampling was slow and I got all kinds of errors:

There were 19 divergences after tuning. Increase target_accept or reparameterize.
There were 55 divergences after tuning. Increase target_accept or reparameterize.
There were 156 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.6974308250411304, but should be close to 0.8. Try to increase the number of tuning steps.
There were 18 divergences after tuning. Increase target_accept or reparameterize.
The number of effective samples is smaller than 10% for some parameters.

I’m also not clear on how to interpret the estimates when using design matrices. Are all the slopes offsets to the orthogonal projection in block 1 (intercept)? Or is ‘beta_X_b2_parallel_offset’, for example, the difference to ‘beta_X_parallel_offset’?
Also in order to stack all the slopes I had to write them down one-by-one, because I couldn’t stack tensors with different amount of columns, e.g. if I had one beta_X_slope with shape 6. If I added condition and more interactions this would get repetitive quite quickly. Is there a vectorized way for this?

Obviously this wasn’t the best attempt. How can it be improved? How can the coords be utilized?

Topic		Replies	Views
Hierarchical regression models for ratings data ( 2 by 2 within-subject design) Questions	3	1741	December 12, 2019
Correlated slopes in multivariate model Questions	9	3445	July 14, 2018
Create model matrix Questions	7	1489	June 23, 2021
Posterior Predictive not similar to Posterior v5 bug , modeling , hierarchical	7	823	September 24, 2022
Hierarchical logistic regression giving non-sensible results? Do I formulate it correctly? Questions	15	844	August 4, 2020

Model Comparison: How to constrain a model to positive mu-differences?

Related topics