Hi everyone! I am trying to get the loglikelihood of the data generated by pm.sample_posterior_predictive
under the corresponding posterior samples. E.g. I want the loglikelihood under the first trace sample of the dataset simulated under the first trace sample, the loglikelihood of the second simulated dataset under the second trace sample, etc. However, the function does not behave like I would expect it to.
For instance, take the following model:
dataset_np = np.random.sample(100)
with pm.Model() as testmodel:
dataset = pm.Data('dataset', dataset_np)
a = pm.Normal('a')
pm.Normal('observed', mu=a, sigma=1, observed=dataset)
test_trace = pm.sample(return_inferencedata=True)
What I am trying to do is finding the likelihood directly while using sample_posterior_predictive
as follows:
with testmodel:
if 'datalogpt' not in testmodel.named_vars:
pm.Deterministic('datalogpt', testmodel.datalogpt)
ppt_loglik = pm.sample_posterior_predictive(
test_trace,
var_names = ['datalogpt', 'observed'],
random_seed=2
)
Now, I can manually check the loglikelihood of the original data for the first posterior sample as follows:
testmodel.datalogpt.eval({a: test_trace.posterior.a[0,0]})
which prints out
array(-96.33916817)
And we can also check the loglikelihood of the simulated data for the first posterior sample as follows:
pm.set_data({'dataset': ppt_loglik['observed'][0]}, model=testmodel)
testmodel.datalogpt.eval({a: test_trace.posterior.a[0,0]})
which prints out
array(-146.20412808)
Finally, the loglikelihood of the first posterior sample recorded in ppt_loglik
is:
ppt_loglik['datalogpt'][0]
which prints out -96.33916817158028
. But this not the loglik of the simulated data for the first posterior sample, it is the loglikelihood of the actual data!
PS: It would be possible of course to get the relevant loglikelihood by iterating over the trace and running for each index i
:
pm.set_data({'dataset': ppt_loglik['observed'][i]}, model=testmodel)
testmodel.datalogpt.eval({a: test_trace.posterior.a.flatten()[i]})
but this is very slow, in contrast to pm.sample_posterior_predictive
which is really fast.