Is there a convenient / idiomatic way to evaluate logposterior (bonus: logprior/likelihood) at sample points?
(I also asked the same here)
Is there a convenient / idiomatic way to evaluate logposterior (bonus: logprior/likelihood) at sample points?
(I also asked the same here)
There is an internal function for evaluating logp:
https://github.com/pymc-devs/pymc3/blob/master/pymc3/stats.py#L129
Regarding slowness of logp evaluation, there is a discussion here (with the solution and explanation):
Thanks for the reply.
I ended up using such a construct
lnp = np.array([mvg_model.logp(trace.point(i,chain=c)) for c in trace.chains for i in range(len(trace))])
For a trace of two chains, 2000 iterations each:
>>> lnp.shape
(4000,)
It is quite fast, the slowness was caused by the fact that I was calling logp method of the obesrved RV instead of the model in the previous variant.
You might want to know that the function you suggested returns something, but of wierd shape and values:
lnp_strange = pm.stats._log_post_trace(trace,mvg_model)
lnp_strange.shape
(2000, 2222)
Where 2000 is length of a single chain, and 2222 is the number of observations.
Here’s a gist https://gist.github.com/madanh/8bb0cff8e83c1be99ebd4147c866e218
Interestingly enough, the sum of those 2222 is offset from the value returned by logp()
(for the highest chain number) by an approximately constant amount. Hmmm
>>> lnp_strange.sum(axis = 1)-lnp[2000:]
array([ 2.77298357, 2.7728608 , 2.77341143, ..., 2.77374797,
2.77308282, 2.77292939])
oh yes _log_post_trace
returns the logp for each observation
FYI you can get much faster logp evaluation by doing:
logp = mvg_model.logp
lnp = np.array([logp(trace.point(i,chain=c)) for c in trace.chains for i in range(len(trace))])
oh yes _log_post_trace returns the logp for each observation
Sorry for too many questions, but do you have an idea why they don’t add up to the value returned by logp
? I have been looking into the code for the last half and hour but haven’t figured this out yet.
_log_post_trace
returns the logp of the observations, given the values in the trace. logp
computes the posterior density (up to a constant, the evidence) of all variables, observed or not. So if you have any free variables, their posterior density will be in model.logp
, but not in _log_post_trace
.
Uhmmm, could not wrap my head around that answer. So posterior of unobserved parameters, lets call them a, is
P(a|X) = P(X|a)*P(a)/P(X)
where X is the set of observations. If the observations are independent, then we could write.
P(a|X) = \prod_i(P(x_i|a))*P(a)/P(X)
So is this correct that what _log_post_trace
returns P(x_i|a) i.e. the multiplicands of the likelihood, while logp
returns the entire numerator? Or something else is the case?
I hope this helps:
http://mathb.in/151684
I wrote it in mathb.in to get latex support…
It did help. I see that I was sloppy in my previous post talking about probabilities (without log). Thanks.