# How to recreate PPC from linear model coefficients?

EDIT: I worked it out by trying to emulate the shape of `sample_ppc`. I iterated over each sample in X_i and chose the appropriate corresponding beta weight. A moderator can feel free to delete this post.

I have a hierarchical Bayesian model (thanks @junpenglao!) that models gene expression of a single sample against a background cohort and the results that I get out of `sample_ppc` are great.

However, I’d like to use the coefficients in the linear model to recapitulate the distribution I get out of `sample_ppc` so I can get a posterior for genes that I did not train with (due to runtime limitations), but my attempts produce a different distribution and I’m not sure why.

From `sample_ppc` (posterior is Laplacian, ppp is posterior predictive p-value relative to the vertical line): If I use the trace coefficients and sampling or averages from X_i, I get something like the below for the same gene: No matter how I use the trace coefficients, I can’t seem to recreate the result I get from `sample_ppc`.

My process is:

1. Calculate expected value of linear model by iterating over trace and sampling from X_i. Or using mean of trace coefs and mean of different X_i groups.
2. Use `np.random.laplace` or `scipy.stats.laplace.rvs`with `loc` set to the above and `scale` set to the model error parameter in the trace.
3. Plot

My X_i are different sizes which is why I randomly sample or take the average over X_i.

Thank you for any help or feedback, example code is below where `t` is the trace and `classes` are the different X_i groups.

``````gene = 'PARP1'
zs = []
for i, a in enumerate(t['a']):
z = a
for j, g in enumerate(classes):
samples = np.random.choice(normal[normal['tissue'] == g][gene])
z += t['b'][i, j] * samples
zs.append(z)

z = np.random.laplace(loc=zs, scale=t['eps'], size=(500, 2000)).ravel()

z_true = sample[gene]
ppp = round(sum(z_true < z) / len(z), 4)

plt.title(f'Gene: {gene} - PPP: {ppp}')
plt.axvline(sample[gene])
sns.kdeplot(z);
``````

Glad you found the solution! I will leave this post here as we always like a post answer itself 1 Like

I have encountered a similar problem trying to reproduce the mean of the ppc using the coefficients from a simple linear model. I’m using the means of the traces for each beta applied to the data to try to produce the predicted value, as you would in a standard OLS regression framework. The resulting predicted values are highly correlated with the mean of the ppc but the values are off by a constant. I’m trying to understand why this is the case. It sounds like a similar situation to what @jvivian describes; I’d be interested to get more details on his solution.