Hi everyone! I am trying to get the loglikelihood of the data generated by pm.sample_posterior_predictive under the corresponding posterior samples. E.g. I want the loglikelihood under the first trace sample of the dataset simulated under the first trace sample, the loglikelihood of the second simulated dataset under the second trace sample, etc. However, the function does not behave like I would expect it to.
For instance, take the following model:
dataset_np = np.random.sample(100)
with pm.Model() as testmodel:
dataset = pm.Data('dataset', dataset_np)
a = pm.Normal('a')
pm.Normal('observed', mu=a, sigma=1, observed=dataset)
test_trace = pm.sample(return_inferencedata=True)
What I am trying to do is finding the likelihood directly while using sample_posterior_predictive as follows:
with testmodel:
if 'datalogpt' not in testmodel.named_vars:
pm.Deterministic('datalogpt', testmodel.datalogpt)
ppt_loglik = pm.sample_posterior_predictive(
test_trace,
var_names = ['datalogpt', 'observed'],
random_seed=2
)
Now, I can manually check the loglikelihood of the original data for the first posterior sample as follows:
testmodel.datalogpt.eval({a: test_trace.posterior.a[0,0]})
which prints out
array(-96.33916817)
And we can also check the loglikelihood of the simulated data for the first posterior sample as follows:
pm.set_data({'dataset': ppt_loglik['observed'][0]}, model=testmodel)
testmodel.datalogpt.eval({a: test_trace.posterior.a[0,0]})
which prints out
array(-146.20412808)
Finally, the loglikelihood of the first posterior sample recorded in ppt_loglik is:
ppt_loglik['datalogpt'][0]
which prints out -96.33916817158028. But this not the loglik of the simulated data for the first posterior sample, it is the loglikelihood of the actual data!
PS: It would be possible of course to get the relevant loglikelihood by iterating over the trace and running for each index i:
pm.set_data({'dataset': ppt_loglik['observed'][i]}, model=testmodel)
testmodel.datalogpt.eval({a: test_trace.posterior.a.flatten()[i]})
but this is very slow, in contrast to pm.sample_posterior_predictive which is really fast.