Hello,
I am using PYMC version 5.10.3. I don’t see the point wise log_likelihood stored in the output InferenceData. According to Model comparison — PyMC 5.10.4 documentation, this can be added to the trace by setting idata_kwargs={"log_likelihood": True}
in pm.sample or by executing pm.compute_log_likelihood with trace. But both ways I am not seeing the log_likelihood group.
Inference data with groups:
> posterior
> sample_stats
Warmup iterations saved (warmup_*).
I have a different loglikelihood computation. Since my observed values are computed in the model I am using pm.Potential (based on TypeError: Invalid Use of Observed Data Variable)as:
tot_likelihood = pm.Potential(“tot_likelihood”,pm.logp(pm.Normal.dist(mu=coordinates_array, sigma=coordinates_sigmas), value=image_coordinates)).
How is it possible to get the log_likelihood values? Your assistance in resolving this matter would be greatly appreciated.
A Potential is ambiguous, because it could be a prior or a posterior (or a combination) of both, so PyMC by default does not compute it for models with Potentials. If you Potential behaves like a distribution you could wrap it in an observed CustomDist, via the logp
kwarg
Thank you for the explanation. But I need some more clarification on how to write the CustomDist. The Potential statement is used since my observed value = image_coordinates is computed in the model. So now if I write a CustomDist with logp defined as:
def logp (image_coordinates,coordinates_array, coordinates_sigmas):
return (pm.logp(pm.Normal.dist(mu=coordinates_array, sigma=coordinates_sigmas), value=image_coordinates))
and instead of pm.Potential I use CustomDist defined as
pm.CustomDist("likelihood", coordinates_array, coordinates_sigmas, logp=logp, observed=image_coordinates)
I will get the error “Variables that depend on other nodes cannot be used for observed data.”
Alternatively after computing loglikelihood using PM potential how can I add this to CustomDist logp?
TL; DR: use
pm.Normal("likelihood", mu=coordinates_array-image_coordinates, sigma=coordinates_sigmas, observed=0)
It doesn’t make sense conceptually from a Bayesian point of view to have the observed data change with the MCMC iteration, so PyMC doesn’t allow that.
From what you have said so far, it looks like you both use the model to generate observations and use the observed data to generate synthetic observations that are dependent on the MCMC interation. So your goal is to have a likelihood that assigns more probability the closer these two are. Thus, you can compute the difference and give 0 as observed values, which allows you to use a normal distribution directly too.