Computing lppd using trace, Help to understand

Nadheesh · March 29, 2018, 1:02pm

I trying to understand how the trace is used calculate log pointwise posterior predictive density (lppd). However, with my lack of knowledge about how PyMC3 internals works, it is difficult for me to understand what happens inside “_log_post_trace” method.

I know that this is the expression for calculating the lppd using draws from p(\theta|y)

So I think inside _log_post_trace,

We iterate through all the samples ( \theta^1,\theta^2...,\theta^\$) for each \theta
Compute p(y|\theta^s) using the likelihood distribution
Then take the average of all p(y|\theta^s)

(let’s not worry about taking log etc for now)

It is not clear for me how the step 2 is done. Assume we have a linear regression model with a normal likelihood,

y\sim Normal(w.x,\sigma^2)

We can assign w = w^s and \sigma = \sigma^s in order to determine the y distribution for a single chain s.

Then how do we take a single probability out of this distribution? Are we taking just the mean (w.x) ?

Please correct me if I’m wrong at any point. I appreciate if someone can help me to understand what happens inside this function.

Thanks

junpenglao · March 29, 2018, 1:22pm

Whenever you build a pymc3 model using with pm.Model() as m:..., the context manager add the logp of each random variable to the model logp m.logp. In fact, you can pass a point to computed the logp of the model conditioned on the parameter from this point by doing: m.logp(m.test_point), or evaluate the logp of the observed RV by doing: obs.logp(m.test_point). (Notices that if you dont have any observed in your model, the observed is taken as the default value of the RV).

So after you do inference (i.e., sample from the posterior), you get a collection of points in the trace. You can evaluate the posterior logp by doing obs.logp(trace[s]) which correspondent to p(y|\theta^s)
This was computed by:

github.com

pymc-devs/pymc/blob/7493d5b61eeff58120f0d0e8b6cfbc05556c565b/pymc3/stats.py#L119


      
              if lag is None:
                  return acov
              else:
                  warnings.warn(
                      "The `lag` argument has been deprecated. If you want to get "
                      "the value of a specific lag please call `autocov(x)[lag]`.",
                      DeprecationWarning)
                  return acov[lag]
          
          
          def _log_post_trace(trace, model=None, progressbar=False):
              """Calculate the elementwise log-posterior for the sampled trace.
          
              Parameters
              ----------
              trace : result of MCMC run
              model : PyMC Model
                  Optional model. Default None, taken from context.
              progressbar: bool
                  Whether or not to display a progress bar in the command line. The
                  bar shows the percentage of completion, the evaluation speed, and

No, you do the mean of [obs.logp(trace[s]) for s in range(nsample)]

I was starting to do some refactoring of the loo function in pymc3, so you can check out this code for another perspective of how the computation is done.

github.com

junpenglao/modelselection_tutorial/blob/PyMC3_nb/PyMC3_nb/loo.py

"""
Some information on infering p_loo and pareto k value
http://discourse.mc-stan.org/t/a-quick-note-what-i-infer-from-p-loo-and-pareto-k-values/3446
"""

from pymc3.model import modelcontext
from pymc3.diagnostics import effective_n
from pymc3 import stats as pmstat
from scipy.misc import logsumexp
import numpy as np
import pandas as pd
import matplotlib.pylab as plt

def loo(trace, model=None, reff=None, progressbar=False):
    """Calculates leave-one-out (LOO) cross-validation for out of sample
    predictive model fit, following Vehtari et al. (2015). Cross-validation is
    computed using Pareto-smoothed importance sampling (PSIS).

    Parameters
    ----------

This file has been truncated. show original

Nadheesh · March 30, 2018, 4:06am

Thank you for very detailed answer @junpenglao. I’m still going through things.

I’m sorry if this question is too primitive. what do we compute using the logp ? Is is a conditional probability of a RV given the value? So basically does it return the probability from the probability distribution for a given event?

What do we compute the log probability (instead of taking just probability)?

junpenglao · March 30, 2018, 8:19am

Depending on where are you calling the logp, but in this context (element-wise posterior logp), it is a conditional probability of the observed value, given the posterior samples. When you are computing it in the lppd, it returns the (log) probability of each observation, conditioned on the posterior samples of the parameters.

As for why log probability, it is mostly for computation, as logp has many nice property, Wikipedia page of logp actually has a nice summary https://en.wikipedia.org/wiki/Log_probability.

Nadheesh · April 2, 2018, 4:28am

Thanks @junpenglao, now that make sense.

Topic		Replies	Views
Calculating model comparison using log pointwise predictive density Questions	15	3510	April 5, 2018
Evaluate logposterior at sample points Questions	9	4363	August 16, 2017
How to calculate log posterior of a GP over a trace in pymc3 Questions	14	2265	February 2, 2018
Loglikelihood of posterior predictive samples Questions	9	1223	February 23, 2022
How to save logp values during sampling? Questions	6	3186	July 28, 2020

Computing lppd using trace, Help to understand

Related topics