Which variables show up in sampling chain and which don't?

phildias · July 24, 2018, 4:21pm

Hi there,

I have a section of my code where I estimate a covariance matrix using the LKJCholeskyCov. However, this is making me really confused as to what randomly distributed variables in my model get a proper “name” and get explicitly sampled in my posterior chain.

Let me show what I mean. Here is an example of a section of my code:

# Priors for error covariance matrix
sd_dist = pm.HalfNormal.dist(sd=1, shape=num_dep_vars)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=num_dep_vars, sd_dist=sd_dist)
chol = pm.expand_packed_triangular(num_dep_vars, packed_chol, lower=True)

# Generate covariance matrix from Cholesky factor
cov_mtx = tt.dot(chol, chol.T)
cov_terms = pm.Deterministic('cov_terms',cov_mtx[cov_term_indices])

# Generate the standard deviations from the cholesky factor
sd  = pm.Deterministic('sd',tt.sqrt(tt.diag(cov_mtx)))

# Tensor manipulation used to extract the correlation terms
cor_mtx = tt.diag(sd**-1).dot(cov_mtx).dot(tt.diag(sd**-1))
cor_terms  = pm.Deterministic('cor_terms',cor_mtx[cor_term_indices])

So I first create a half-normally distributed set of standard deviations, build up a vector of cholesky factors, build up a covariance matrix. Then, i start “going backwards”, extracting the variances, standard deviations and finally, the actual correlations.

But why do I have to do this to get to the standard deviations? Aren’t I already calculating them at the top of the code, when I say sd_dist = pm.HalfNormal.dist(sd=1, shape=num_dep_vars)? Why isn’t sd_dist being “watched” as a variable to be sampled from in the posterior, as is the packed_chol variable? Is it because sd_dist isn’t named? Also, why can I not give sd_dist a name?

I’m just getting a bit confused here with what gets a proper name and what gets sampled from in my posterior chains.

Thanks!

junpenglao · July 24, 2018, 4:42pm

LKJCholeskyCov is defined based on the LKJ Correlation matrix, which does not contain information of the variance (diagonal being 1). Here we give it a distribution so that the diagonal elements could be evaluated on.
I think you can do below to keep track the sd, but not sure - please verify yourself.

sd_dist = pm.HalfNormal('sd', sd=1, shape=num_dep_vars)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=num_dep_vars, sd_dist=sd.distribution)

phildias · July 24, 2018, 5:59pm

In that second line, when you have sd_dist=sd.distribution, I think you meant sd_dist=sd_dist.distribution =)

That worked! But I’ve noticed something odd. Here are the traces for sd_dist, sd and chol_cov:

My question is: why does sd_dist look like a distribution while sd and chol_cov look like actual parameters? I’ve defined all three of these things so similarly in my model. Why are they behaving so differently?

Thanks!

junpenglao · July 24, 2018, 6:48pm

Hmm I was wrong,sd_dist is just a free node on the graph and the sample you see is from the prior. LKJCholeskyCov is evaluating the logp using the log-likelihood but the RV sd_dist is not actually connect to any other variables in the model.

In that case, there probably no easier way to log the sd_dist into the trace than what you are already doing with the deterministic.

phildias · July 24, 2018, 7:15pm

Thanks for the clarification, @junpenglao! But my question still remains: if I defined sd_dist just like the other variables, why is PYMC drawing from it’s prior instead of its posterior?

I think i’m missing something super basic and silly here.

Thanks again!

junpenglao · July 24, 2018, 7:22pm

Because the sd_dist is never actually used in the model, you can think of it as a potential function, which is evaluated and added to the model logp, but will not leave samples in the MCMC chain (in another word, it is not a free parameter that other parameters depends on)

Topic		Replies	Views
Using dims with LKJCholeskyCov	4	217	February 21, 2024
Uses of LKJCholeskyCov and LKJCorr Questions	2	2269	July 26, 2019
Sampler initialization error with a model containing an LKJCholeskyCov distribution bug	5	369	July 17, 2023
LKJCholeskyCov random method Questions	2	467	August 28, 2018
Using LKJCorr together with MvNormal version agnostic	44	717	January 16, 2024

Which variables show up in sampling chain and which don't?

Related topics