It sounds like you’re describing some kind of nested trial design, with replicates. I’ll assume there are two arms (“drug”, “placebo”) and one nesting variable (“knockout_A”, “knockout_B”, “knockout_C”, “WT”), each of which have some number of replicates. This is classical linear model territory, and is fairly straightforward.
if the knockouts are to be compared. If you have the observations at the granular level (i.e. you’re not observing means) then BEST can be updated straightforwardly. This generates some replicated data
import numpy as np
import pandas as pd
n_replicates = 4
is_drug = np.hstack([np.ones(4), np.zeros(4)])
eye = np.diag(np.ones(4))
design = np.hstack([is_drug.reshape(8,1), np.vstack([eye, eye])])
design = pd.DataFrame(np.vstack([design] * 4))
design.columns=['drug', 'ko_A', 'ko_B', 'ko_C', 'WT']
effects = np.array([0.4, -0.1, 0.3, 0.1, 0.0])
total_r2 = 0.75
true_values = np.dot(design.values, effects)
vtrue = np.var(true_values)
vobs = vtrue/total_r2 # has a tendency to really fluctuate
verr = vobs - vtrue
observed_values = true_values + np.random.normal(scale=np.sqrt(verr), size=(true_values.shape[0],))
The approach in the example parametrizes the setting using the difference of means between groups. We’re going to do that here as well, but making this hierarchical means treating this difference as a random variable per group, with a shared distribution.
group_index = {gname: np.where(design.loc[:, gname] == 1)[0] for gname in design.columns[1:]}
with pm.Model() as hierarchical:
drug_eff_overall = pm.Normal('drug_eff', 0., 1.)
intercept_prior = pm.Normal('intercept', 0., 1.)
err_sd_hp = pm.HalfNormal('error_sd_hp', 1.)
# build the per_group submodel
for grp in group_index:
treatment = design.iloc[group_index[grp], 0]
obs = observed_values[group_index[grp]]
group_effect = pm.Normal('{}_eff'.format(grp), intercept_prior, 0.1)
group_drug_delta = pm.Normal('{}_drug_delta'.format(grp),
drug_eff_overall, 0.1)
# i have pretty good reason to believe the effects
# are the same across groups
lat_metric = pm.Deterministic('{}_lat_met'.format(grp), group_drug_delta * treatment + group_effect)
obs_metric = pm.Normal('{}_metric'.format(grp), lat_metric, err_sd_hp, observed=obs)
fit_res = pm.sample(500, cores=1)
Then the plots
import seaborn as sbn
from matplotlib import pyplot as plt
wnames = [x for x in fit_res.varnames if x[-2:] != '__' and 'lat_' not in x]
vardf = pd.DataFrame(np.hstack([pd.DataFrame(fit_res[x]) for x in wnames]))
vardf.columns = wnames
def add0(xvals, **kwargs):
plt.gca().axvline(linewidth=2, color='r')
return
pp = sbn.pairplot(vardf.loc[:,np.array(['intercept', 'drug_eff'] + [x for x in wnames if 'delta' in x])])
pp.map_diag(add0)
plt.figure()
pp = sbn.pairplot(vardf.loc[:, np.array([x for x in wnames if 'eff' in x])])
pp.map_diag(add0)
One thing i noticed is that it’s not super straightforward to perform hierarchical nesting given a design matrix. I’m used to syntax like “y ~ 1 + group + drug|group” . I guess I could create an auxiliary design matrix as design.values[:,0][:, np.newaxis] * design.values[:,1:]
and then use
drug_eff = pm.Normal(..., shape=(4,))
lat = pm.Deterministic('lat', group_eff + tt.dot(aux_design, drug_eff))`
but that seems to introduce a lot of 0*variable
into the graph. Is there a “right” way to do this?