I trying to understand how the trace is used calculate log pointwise posterior predictive density (lppd). However, with my lack of knowledge about how PyMC3 internals works, it is difficult for me to understand what happens inside “_log_post_trace” method.
I know that this is the expression for calculating the lppd using draws from p(\theta|y)
So I think inside _log_post_trace,
We iterate through all the samples ( \theta^1,\theta^2...,\theta^\$) for each \theta
Compute p(y|\theta^s) using the likelihood distribution
Then take the average of all p(y|\theta^s)
(let’s not worry about taking log etc for now)
It is not clear for me how the step 2 is done. Assume we have a linear regression model with a normal likelihood,
y\sim Normal(w.x,\sigma^2)
We can assign w = w^s and \sigma = \sigma^s in order to determine the y distribution for a single chain s.
Then how do we take a single probability out of this distribution? Are we taking just the mean (w.x) ?
Please correct me if I’m wrong at any point. I appreciate if someone can help me to understand what happens inside this function.
Whenever you build a pymc3 model using with pm.Model() as m:..., the context manager add the logp of each random variable to the model logp m.logp. In fact, you can pass a point to computed the logp of the model conditioned on the parameter from this point by doing: m.logp(m.test_point), or evaluate the logp of the observed RV by doing: obs.logp(m.test_point). (Notices that if you dont have any observed in your model, the observed is taken as the default value of the RV).
So after you do inference (i.e., sample from the posterior), you get a collection of points in the trace. You can evaluate the posterior logp by doing obs.logp(trace[s]) which correspondent to p(y|\theta^s)
This was computed by:
No, you do the mean of [obs.logp(trace[s]) for s in range(nsample)]
I was starting to do some refactoring of the loo function in pymc3, so you can check out this code for another perspective of how the computation is done.
Thank you for very detailed answer @junpenglao. I’m still going through things.
I’m sorry if this question is too primitive. what do we compute using the logp ? Is is a conditional probability of a RV given the value? So basically does it return the probability from the probability distribution for a given event?
What do we compute the log probability (instead of taking just probability)?
Depending on where are you calling the logp, but in this context (element-wise posterior logp), it is a conditional probability of the observed value, given the posterior samples. When you are computing it in the lppd, it returns the (log) probability of each observation, conditioned on the posterior samples of the parameters.
As for why log probability, it is mostly for computation, as logp has many nice property, Wikipedia page of logp actually has a nice summary https://en.wikipedia.org/wiki/Log_probability.