Hello everyone,
I’m a relative newcomer to Bayesian inference, so forgive me if my question seems a bit elementary. Because of your valuable assistance in the past, I’m reaching out again for some guidance. In my experiment, I have 20 groups (in my case, genes) and I’m trying to discern differences in expression between two samples, Ht0 and Ht1, for each gene. My main goal is to rank these differences.
After running my model, I manage to obtain the beta parameter, which behaves as expected. However, I’m finding it difficult to determine the rate of beta. An obvious method would be to calculate the mean of Beta from the posterior, but this could potentially neglect the variances’ effect. In terms of frequentist statistics, I would typically compare H0 to H1, carry out an ANOVA test, and obtain a p-value for each gene.
At one point, I encountered a suggestion about using the Kullback-Leibler Divergence (distance matrix) to accomplish this, but have been unsuccessful in finding a detailed tutorial for applying it in PyMC. I checked out the PyMC T-test page but didn’t find it very helpful for my specific project.
If I decide to rank beta by only taking the mean, how do I convincingly argue that the ranking isn’t influenced by variance? Any advise or insights would be greatly appreciated. A big thank you in advance!
On a side-note, here’s the simulation model I’ve been using (which works flawlessly with simulated data):
Data genration
samples = 50
Genes_num = 20
sd = sd = (0.1,0.8,0.5)
z_ef = [1]*20
sg_ef = pm.draw(pm.Gamma.dist(mu = 1, sigma =1), draws=20, random_seed=RANDOM_SEED)
u_ef = [1]*20
M = np.tile(np.linspace(0.5,1.5,Genes_num),(samples,1))
z_ = pm.TruncatedNormal.dist(mu = z_ef, sigma = sd[0],lower = 0.1,upper = 10) #nochange
z = pm.draw(z_, draws=samples, random_seed=RANDOM_SEED)
U = np.tile(np.linspace(0.8,1.8,Genes_num),(samples,1))
u_ = pm.TruncatedNormal.dist(mu = u_ef, sigma = sd[1],lower = 0.1,upper = 10) # no change
u = pm.draw(u_, draws=samples, random_seed=RANDOM_SEED)
H0 = z*M + u*U
sg_ = pm.TruncatedNormal.dist(mu = sg_ef,sigma = sd[2],lower = 0.1,upper = 10) # no change
sg = pm.draw(sg_, draws=samples, random_seed=RANDOM_SEED)
Ht = (sg)*H0
df_mod1 = pd.DataFrame({"sgRNA":np.tile(np.arange(0,Genes_num,1),(samples)),"b":sg.ravel(),
"H0":H0.ravel(),"Ht":Ht.ravel(),"M":M.ravel()})
Model:
sgRNA,mn_sgRNA = df_mod1.sgRNA.factorize()
X = df_mod1.H0.values
Y = df_mod1.Ht.values
IV = df_mod1.Hi.values
coords = {"sgRNA":mn_sgRNA}
with pm.Model(coords=coords) as model1:
#Hyper-proirs
beta_bar = pm.Normal("beta_bar",1,2)
sigma_bar = pm.Exponential("sigma_bar",1)
# Non-centered random slopes
z_beta = pm.Normal("z_beta", 1, 2, dims="sgRNA")
beta = pm.Deterministic("beta", beta_bar + z_beta*sigma_bar)
Mu = pm.Deterministic("Mu",beta[sgRNA]*X)
sigma = pm.Exponential("sigma",1)
sgRNA_ = pm.Data("sgRNA", sgRNA)
count = pm.Normal("count",mu = Mu, sigma = sigma, observed=Y)
trace_mod1 = pm.sample(1000, tune=1000, target_accept=0.95, random_seed=RANDOM_SEED)