How to recreate PPC from linear model coefficients?

jvivian · December 5, 2018, 12:03am

EDIT: I worked it out by trying to emulate the shape of sample_ppc. I iterated over each sample in X_i and chose the appropriate corresponding beta weight. A moderator can feel free to delete this post.

I have a hierarchical Bayesian model (thanks @junpenglao!) that models gene expression of a single sample against a background cohort and the results that I get out of sample_ppc are great.

However, I’d like to use the coefficients in the linear model to recapitulate the distribution I get out of sample_ppc so I can get a posterior for genes that I did not train with (due to runtime limitations), but my attempts produce a different distribution and I’m not sure why.

From sample_ppc (posterior is Laplacian, ppp is posterior predictive p-value relative to the vertical line):

If I use the trace coefficients and sampling or averages from X_i, I get something like the below for the same gene:

No matter how I use the trace coefficients, I can’t seem to recreate the result I get from sample_ppc.

My process is:

Calculate expected value of linear model by iterating over trace and sampling from X_i. Or using mean of trace coefs and mean of different X_i groups.
Use np.random.laplace or scipy.stats.laplace.rvswith loc set to the above and scale set to the model error parameter in the trace.
Plot

My X_i are different sizes which is why I randomly sample or take the average over X_i.

Thank you for any help or feedback, example code is below where t is the trace and classes are the different X_i groups.

gene = 'PARP1'
zs = []
for i, a in enumerate(t['a']):
    z = a
    for j, g in enumerate(classes):
        samples = np.random.choice(normal[normal['tissue'] == g][gene])
        z += t['b'][i, j] * samples
    zs.append(z)

z = np.random.laplace(loc=zs, scale=t['eps'], size=(500, 2000)).ravel()

z_true = sample[gene]
ppp = round(sum(z_true < z) / len(z), 4)

plt.title(f'Gene: {gene} - PPP: {ppp}')
plt.axvline(sample[gene])
sns.kdeplot(z);

junpenglao · December 6, 2018, 5:54am

Glad you found the solution! I will leave this post here as we always like a post answer itself

Michael_Johns · July 24, 2019, 11:34pm

I have encountered a similar problem trying to reproduce the mean of the ppc using the coefficients from a simple linear model. I’m using the means of the traces for each beta applied to the data to try to produce the predicted value, as you would in a standard OLS regression framework. The resulting predicted values are highly correlated with the mean of the ppc but the values are off by a constant. I’m trying to understand why this is the case. It sounds like a similar situation to what @jvivian describes; I’d be interested to get more details on his solution.

Topic		Replies	Views
Recommended approach for OOS prediction Questions	1	670	June 19, 2017
Model not reproducing observed data? Questions	9	1061	July 25, 2018
Concepts of Parameter Estimation and Predictions, and Out of Sample Predicted Probability for Logistic Regression Questions	5	1344	May 11, 2018
Sampling from glm posterior Questions	1	412	July 10, 2018
Sample from Prior Distribution Questions	0	318	April 15, 2020

How to recreate PPC from linear model coefficients?

Related Topics