# Which variables show up in sampling chain and which don't?

Hi there,

I have a section of my code where I estimate a covariance matrix using the LKJCholeskyCov. However, this is making me really confused as to what randomly distributed variables in my model get a proper “name” and get explicitly sampled in my posterior chain.

Let me show what I mean. Here is an example of a section of my code:

``````# Priors for error covariance matrix
sd_dist = pm.HalfNormal.dist(sd=1, shape=num_dep_vars)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=num_dep_vars, sd_dist=sd_dist)
chol = pm.expand_packed_triangular(num_dep_vars, packed_chol, lower=True)

# Generate covariance matrix from Cholesky factor
cov_mtx = tt.dot(chol, chol.T)
cov_terms = pm.Deterministic('cov_terms',cov_mtx[cov_term_indices])

# Generate the standard deviations from the cholesky factor
sd  = pm.Deterministic('sd',tt.sqrt(tt.diag(cov_mtx)))

# Tensor manipulation used to extract the correlation terms
cor_mtx = tt.diag(sd**-1).dot(cov_mtx).dot(tt.diag(sd**-1))
cor_terms  = pm.Deterministic('cor_terms',cor_mtx[cor_term_indices])
``````

So I first create a half-normally distributed set of standard deviations, build up a vector of cholesky factors, build up a covariance matrix. Then, i start “going backwards”, extracting the variances, standard deviations and finally, the actual correlations.

But why do I have to do this to get to the standard deviations? Aren’t I already calculating them at the top of the code, when I say `sd_dist = pm.HalfNormal.dist(sd=1, shape=num_dep_vars)`? Why isn’t sd_dist being “watched” as a variable to be sampled from in the posterior, as is the `packed_chol` variable? Is it because `sd_dist` isn’t named? Also, why can I not give `sd_dist` a name?

I’m just getting a bit confused here with what gets a proper name and what gets sampled from in my posterior chains.

Thanks!

`LKJCholeskyCov` is defined based on the LKJ Correlation matrix, which does not contain information of the variance (diagonal being 1). Here we give it a distribution so that the diagonal elements could be evaluated on.
I think you can do below to keep track the sd, but not sure - please verify yourself.

``````sd_dist = pm.HalfNormal('sd', sd=1, shape=num_dep_vars)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=2, n=num_dep_vars, sd_dist=sd.distribution)
``````

In that second line, when you have `sd_dist=sd.distribution`, I think you meant `sd_dist=sd_dist.distribution` =)

That worked! But I’ve noticed something odd. Here are the traces for `sd_dist`, `sd` and `chol_cov`:

My question is: why does `sd_dist` look like a distribution while `sd` and `chol_cov` look like actual parameters? I’ve defined all three of these things so similarly in my model. Why are they behaving so differently?

Thanks!

Hmm I was wrong,`sd_dist` is just a free node on the graph and the sample you see is from the prior. LKJCholeskyCov is evaluating the logp using the log-likelihood but the RV `sd_dist` is not actually connect to any other variables in the model.

In that case, there probably no easier way to log the `sd_dist` into the trace than what you are already doing with the deterministic.

Thanks for the clarification, @junpenglao! But my question still remains: if I defined `sd_dist` just like the other variables, why is PYMC drawing from it’s prior instead of its posterior?

I think i’m missing something super basic and silly here.

Thanks again!

Because the `sd_dist` is never actually used in the model, you can think of it as a potential function, which is evaluated and added to the model logp, but will not leave samples in the MCMC chain (in another word, it is not a free parameter that other parameters depends on)