Evaluate logposterior at sample points


#1

Is there a convenient / idiomatic way to evaluate logposterior (bonus: logprior/likelihood) at sample points?

(I also asked the same here)


Frequently Asked Questions
#2

There is an internal function for evaluating logp:

Regarding slowness of logp evaluation, there is a discussion here (with the solution and explanation):
https://github.com/pymc-devs/pymc3/issues/2480


Computing out-of-sample deviance
#3

Thanks for the reply.

I ended up using such a construct
lnp = np.array([mvg_model.logp(trace.point(i,chain=c)) for c in trace.chains for i in range(len(trace))])

For a trace of two chains, 2000 iterations each:

    >>> lnp.shape
    (4000,)

It is quite fast, the slowness was caused by the fact that I was calling logp method of the obesrved RV instead of the model in the previous variant.

You might want to know that the function you suggested returns something, but of wierd shape and values:
lnp_strange = pm.stats._log_post_trace(trace,mvg_model)

    lnp_strange.shape
    (2000, 2222)

Where 2000 is length of a single chain, and 2222 is the number of observations.

Here’s a gist https://gist.github.com/madanh/8bb0cff8e83c1be99ebd4147c866e218


#4

Interestingly enough, the sum of those 2222 is offset from the value returned by logp() (for the highest chain number) by an approximately constant amount. Hmmm

    >>> lnp_strange.sum(axis = 1)-lnp[2000:]
    array([ 2.77298357,  2.7728608 ,  2.77341143, ...,  2.77374797,
            2.77308282,  2.77292939])

#5

oh yes _log_post_trace returns the logp for each observation

FYI you can get much faster logp evaluation by doing:

logp = mvg_model.logp
lnp = np.array([logp(trace.point(i,chain=c)) for c in trace.chains for i in range(len(trace))])

#6

oh yes _log_post_trace returns the logp for each observation

Sorry for too many questions, but do you have an idea why they don’t add up to the value returned by logp? I have been looking into the code for the last half and hour but haven’t figured this out yet.


#7

_log_post_trace returns the logp of the observations, given the values in the trace. logp computes the posterior density (up to a constant, the evidence) of all variables, observed or not. So if you have any free variables, their posterior density will be in model.logp, but not in _log_post_trace.


#8

Uhmmm, could not wrap my head around that answer. So posterior of unobserved parameters, lets call them a, is

P(a|X) = P(X|a)*P(a)/P(X)

where X is the set of observations. If the observations are independent, then we could write.
P(a|X) = \prod_i(P(x_i|a))*P(a)/P(X)

So is this correct that what _log_post_trace returns P(x_i|a) i.e. the multiplicands of the likelihood, while logp returns the entire numerator? Or something else is the case?


#9

I hope this helps:
http://mathb.in/151684
I wrote it in mathb.in to get latex support…


#10

It did help. I see that I was sloppy in my previous post talking about probabilities (without log). Thanks.


How to save logp values during sampling?