Variance and Parameter Ranking in Bayesian Inference

Hello everyone,

I’m a relative newcomer to Bayesian inference, so forgive me if my question seems a bit elementary. Because of your valuable assistance in the past, I’m reaching out again for some guidance. In my experiment, I have 20 groups (in my case, genes) and I’m trying to discern differences in expression between two samples, Ht0 and Ht1, for each gene. My main goal is to rank these differences.

After running my model, I manage to obtain the beta parameter, which behaves as expected. However, I’m finding it difficult to determine the rate of beta. An obvious method would be to calculate the mean of Beta from the posterior, but this could potentially neglect the variances’ effect. In terms of frequentist statistics, I would typically compare H0 to H1, carry out an ANOVA test, and obtain a p-value for each gene.

At one point, I encountered a suggestion about using the Kullback-Leibler Divergence (distance matrix) to accomplish this, but have been unsuccessful in finding a detailed tutorial for applying it in PyMC. I checked out the PyMC T-test page but didn’t find it very helpful for my specific project.

If I decide to rank beta by only taking the mean, how do I convincingly argue that the ranking isn’t influenced by variance? Any advise or insights would be greatly appreciated. A big thank you in advance!

On a side-note, here’s the simulation model I’ve been using (which works flawlessly with simulated data):
Data genration

\begin{align} H0 &= z \times M + u \times U\\ Ht &= sg \times H0\\ \end{align}
samples = 50
Genes_num = 20
sd = sd = (0.1,0.8,0.5)
z_ef = [1]*20
sg_ef = pm.draw(pm.Gamma.dist(mu = 1, sigma =1), draws=20, random_seed=RANDOM_SEED)
u_ef = [1]*20

M = np.tile(np.linspace(0.5,1.5,Genes_num),(samples,1))
z_ = pm.TruncatedNormal.dist(mu = z_ef, sigma = sd[0],lower = 0.1,upper = 10) #nochange
z = pm.draw(z_, draws=samples, random_seed=RANDOM_SEED)

U = np.tile(np.linspace(0.8,1.8,Genes_num),(samples,1))
u_ = pm.TruncatedNormal.dist(mu = u_ef, sigma = sd[1],lower = 0.1,upper = 10) # no change
u = pm.draw(u_, draws=samples, random_seed=RANDOM_SEED)

H0 = z*M + u*U

sg_ = pm.TruncatedNormal.dist(mu = sg_ef,sigma = sd[2],lower = 0.1,upper = 10) # no change
sg = pm.draw(sg_, draws=samples, random_seed=RANDOM_SEED)

Ht = (sg)*H0

df_mod1 = pd.DataFrame({"sgRNA":np.tile(np.arange(0,Genes_num,1),(samples)),"b":sg.ravel(),
                               "H0":H0.ravel(),"Ht":Ht.ravel(),"M":M.ravel()})

Model:

\begin{align} Ht &\sim \text{Normal}(\mu,\sigma)\tag{1} \\ \sigma &\sim \text{HalfNormal}(0.5) \\ \mu &\sim \beta_{j} \times H0\\ \beta_{j} &\sim \text{Normal}(\tilde{\beta},\tilde{\sigma})\\ \end{align}
sgRNA,mn_sgRNA = df_mod1.sgRNA.factorize()
X = df_mod1.H0.values
Y = df_mod1.Ht.values
IV = df_mod1.Hi.values
coords = {"sgRNA":mn_sgRNA}
with pm.Model(coords=coords) as model1:
    #Hyper-proirs
    beta_bar = pm.Normal("beta_bar",1,2)
    sigma_bar = pm.Exponential("sigma_bar",1)

    # Non-centered random slopes
    z_beta = pm.Normal("z_beta", 1, 2, dims="sgRNA")
    beta = pm.Deterministic("beta", beta_bar + z_beta*sigma_bar)

    Mu = pm.Deterministic("Mu",beta[sgRNA]*X)

    sigma = pm.Exponential("sigma",1)

    sgRNA_ = pm.Data("sgRNA", sgRNA)

    count = pm.Normal("count",mu = Mu, sigma = sigma, observed=Y)
    trace_mod1 = pm.sample(1000, tune=1000, target_accept=0.95, random_seed=RANDOM_SEED)

I think you just need to do some posterior predictive sampling. After you fit you model and obtain trace_mod1, you can make a new “dummy” model like this:

coords.update({'sgRNA_bis':coords['sgRNA']})
with pm.Model(coords=coords):
    beta_bar = pm.Flat("beta_bar")
    sigma_bar = pm.Flat("sigma_bar")

    # Non-centered random slopes
    z_beta = pm.Flat("z_beta", dims="sgRNA")
    beta = pm.Deterministic("beta", beta_bar + z_beta*sigma_bar)
    
    beta_distances = pm.Deterministic('beta_dist', beta[:, None] - beta[None], dims=['sgRNA', 'sgRNA_bis'])
    idata_dist = pm.sample_posterior_predictive(trace_mod1, var_names=['beta_dist'])

To understand what this code block does, maybe it’s best to go from the bottom to the top:

  1. I want to sample from the posterior, which is contained inside trace_mod1. So I am passing that to idata_dist. The quantity I want to same is going to be called beta_dist. I’m saving the result to a new idata (this is important, you don’t want to overwrite your posterior!)
  2. beta_dist is just a matrix of pairwise differences between every combination of beta
  3. To obtain beta, I need all of it’s “parent” random variables, that is, the variables that point into it on this DAG:

  1. So I add beta to this new dummy model, with a new name, beta_post (more on the name in a second)
  2. I also add the parent RVs, beta_bar, sigma_bar, and z_beta, to the dummy model, with pm.Flat distributions.

Internally, when you call pm.sample_posterior_predictive, PyMC will seek out variables in the model that have names in the posterior group of the InferenceData object you provide. If it finds a hit, it replaces the random variable in the model with samples from the posterior. In this case, it will find beta_bar, sigma_bar, and z_beta. Putting pm.Flat distributions on these was a diagnostic check – if PyMC can’t find these in the posterior, it will raise an error (because it can’t sample from pm.Flat).

So next it will take those posterior samples and use them to re-compute beta_post, which it does not find in the posterior, and finally use that to compute beta_dist.

You can then use az.plot_posterior like this to look at the results one-by-one, or az.plot_forest to quickly look at everything all at once. I added that sgRNA_bis coord so I could label both dimensions of the distance matrix for easy plotting. Here I use az.plot_posterior to plot the beta for sgRNA number 3 against the first 6 other sgRNA betas (excluding itself):

az.plot_posterior(idata_dist.posterior_predictive,
                  var_names=['beta_dist'], 
                  coords={'sgRNA':mn_sgRNA[3],
                          'sgRNA_bis':mn_sgRNA[:3].tolist() + mn_sgRNA[4:7].tolist()});

So we can see that the beta for sgRNA[3] stochastically dominates those of sgRNA[0, 1, 2, 4, 5], but is dominated by sgRNA[6]. (maybe there’s some room for debate about strict dominance in some of these cases because p(0) > \epsilon for some tiny epsilon in cases 0 and 2?

Thank you so much for your thorough response.

When I tried to execute the “dummy” model, it was flagging that other parameters like sigma and Mu were missing. The model started to function when I assigned pm.FLAT to these parameters, but I’m not completely sure why this step is necessary. My understanding is that the values of these parameters are already inferred from the fitted model, so why do we need to allocate flat priors?

Additionally, I have been struggling to decipher the RV structure in the PPA. Usually, I run the model and then utilize beta.eval().shape to comprehend the dimensions, but this approach seems to fail when it comes to PPA analysis. As a result, I don’t understand why we use beta[:, None]; from the fitted model, beta has 20 sgRNAs, 1000 draws, and 4 chains.

In my actual experiment, there is another level of clustering — relevant in the analysis — that I have not yet addressed. The data in the model is clustered into sgRNA groups where every five unique sgRNAs are associated with a unique gene, a technique prevalent in biology where sgRNAs target genes for silencing.

The structure resembles this:
sgRNA index = [0,1,…,4,5,…8,9,…,12,13…16,17,…20]
Gene = [0,0,…,1,1…,2,2….,3,3…,4,4,…,4]

sgRNA = np.arange(0,20,1).tolist()
Gene = np.repeat(np.arange(0,5),4).tolist()

In an attempt to calculate beta_dist on a gene level, how do I integrate the gene indexing vector into the post-predictive analysis?

Lastly, in my research, approximately 13,000 sgRNAs are targeting around 2000 genes. If my calculations hold, working out the combinations for beta genes could be a fairly large number ((2000*1999)/2), which could grow further if conducted at the sgRNA level.

Your assistance and the resources of this forum and the PyMC program have been incredibly valuable! Thank you once again.

I should have linked you this blog post that goes deeper into how this dummy stuff works. I also tried to explain it here, maybe that will help?

The dummy model needs to have all information on how to compute stuff up to the nodes you specify in var_names. In my example, the var_names was only beta_dist, which needs to know beta, which (looking at the DAG) needs to know beta_bar, sigma_bar, and z_beta. If you instead asked for a variable deeper into the dag, like count, you’d have to specify everything upstream of that, which would include sigma and mu.

While the variables are just pytenor symbols on a computation graph – that is, before you compile and run the model – you don’t have to reason about the batch dimension (chains and draws). You just have to think about the core dimensions. So on the graph, beta is just a (20,) array, so beta[:, None] is a (20, 1) column vector. Pytensor’s whole raison d’etre is to then handle the vectorization. Just focus on the shapes as you would understand them if you were writing out your model on a piece of paper. Or better yet, use dims absolutely everywhere, and reason about the named dimensions instead of the shapes.

Actually there’s nothing special about sample_posterior_predictive. If all you wanted were pairwise differences between the betas, you can just do it with the (4, 1000, 20) posterior with something like idata.posterior.beta.values[..., None] - idata.posterior.beta.values[None, ...]. Here’s an example where I first tried to do this, then realized it was dumb and overly complicated and did it elegantly with sample_posterior_predictive in a dummy model.

Exactly the same way you put it into your model. It should be a 1-to-1 map. Copy+paste everything you need up until you compute beta in your primary model, then use that beta to compute some statistic of interest (beta_dist in my example) , then pm.sample_posterior_predictive(idata, var_names=[statistic_of_interest]). If you get stuck I’m happy to look at specific code.

As for size, you’re just doing pairwise subtraction so it should scale pretty well. If it gets really bad you could try to think about how to compute only the unique pairs with some clever indexing, since the distance matrix is symmetrical.

1 Like

Thank you so much for your response. I am still trying to understand your explanation, but in the meantime, I have noticed a strange behavior in my model. I believe I may have made an error in my understanding or implementation.

Based on the radon example, I compared complete pooling (CP), partial pooling (PP), and no pooling (NP) on the posterior distribution. Specifically, I ran a posterior predictive analysis to predict count data:

with model1:
    pm.set_data(dict(sgRNA = sgRNA))
    post_pred_1_1 = pm.sample_posterior_predictive(trace_mod1, var_names=["count"], random_seed=RANDOM_SEED)["posterior_predictive"]

In the first plot, I plotted the count (posterior), and in the second plot, I plotted the beta (from the trace). The red horizontal lines represent the mean of the response variable, and the dashed line represents the population mean.

However, I observed that the count parameter is not on the same scale. Additionally, the beta values seem to be shrinking towards a mean that is much higher than the population mean.


I would appreciate any insights into where I may have gone wrong in my understanding or implementation.