EDIT: I worked it out by trying to emulate the shape of sample_ppc
. I iterated over each sample in X_i and chose the appropriate corresponding beta weight. A moderator can feel free to delete this post.
I have a hierarchical Bayesian model (thanks @junpenglao!) that models gene expression of a single sample against a background cohort and the results that I get out of sample_ppc
are great.
However, I’d like to use the coefficients in the linear model to recapitulate the distribution I get out of sample_ppc
so I can get a posterior for genes that I did not train with (due to runtime limitations), but my attempts produce a different distribution and I’m not sure why.
From sample_ppc
(posterior is Laplacian, ppp is posterior predictive p-value relative to the vertical line):
If I use the trace coefficients and sampling or averages from X_i, I get something like the below for the same gene:
No matter how I use the trace coefficients, I can’t seem to recreate the result I get from sample_ppc
.
My process is:
- Calculate expected value of linear model by iterating over trace and sampling from X_i. Or using mean of trace coefs and mean of different X_i groups.
- Use
np.random.laplace
orscipy.stats.laplace.rvs
withloc
set to the above andscale
set to the model error parameter in the trace. - Plot
My X_i are different sizes which is why I randomly sample or take the average over X_i.
Thank you for any help or feedback, example code is below where t
is the trace and classes
are the different X_i groups.
gene = 'PARP1'
zs = []
for i, a in enumerate(t['a']):
z = a
for j, g in enumerate(classes):
samples = np.random.choice(normal[normal['tissue'] == g][gene])
z += t['b'][i, j] * samples
zs.append(z)
z = np.random.laplace(loc=zs, scale=t['eps'], size=(500, 2000)).ravel()
z_true = sample[gene]
ppp = round(sum(z_true < z) / len(z), 4)
plt.title(f'Gene: {gene} - PPP: {ppp}')
plt.axvline(sample[gene])
sns.kdeplot(z);