Hello,
For this question, I’m currently using PyMC v5.0.1.
This is an extension from a previous topic, but I was hoping to get some advice on using PyMC’s pm.CustomDist
(or something similar) to define the likelihood such that a posterior predictive check (ppc) can be run on the custom likelihood. I understand how pm.Potential
can be used to tack on the custom logp, but as has been outlined in numerous topics, you can’t do anything more after that.
To recap, I have summary-level concentration data (mean and standard deviation) reported from a study and I’d like to fit these data with a custom log-normally distributed log-likelihood function. The specifics of that function are in the previous topic. I’ve attempted this two different ways as outlined below.
1. Use pm.CustomDist
First, I attempted to follow the documentation on pm.CustomDist
to manually define the logp
and random
functions:
import pymc as pm
from pytensor.tensor import TensorVariable
def _lognormal_logp(value: TensorVariable, mu: TensorVariable, sigma: TensorVariable, Ni_summary: TensorVariable, lnsd2_summary: TensorVariable) -> TensorVariable:
N = np.sum(Ni_summary)
term1 = (Ni_summary*value)
term2 = (Ni_summary/2.)*pm.math.log(sigma**2)
term3 = ((Ni_summary - 1)*lnsd2_summary)/(2*sigma**2)
term4 = (Ni_summary*(value - mu)**2)/(2*sigma**2)
LL = (N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
return LL
def _lognormal_random(mu: TensorVariable, sigma: TensorVariable, size: TensorVariable) -> TensorVariable:
return pm.LogNormal.dist(mu, sigma=sigma, size=size)
within pm.Model
, I use pm.CustomDist
:
with pm.Model as model:
" <All the required model specs>"
# lnconc_summary is the model predicted concentrations (log-transformed) for the summary-level data
# sigma_summary is the estimated model error for the summary level data (log-space)
# Ni_summary is the number of animals contributing to each observed summary-level data point
# lnsd2_summary is the observed variance (log-transformed)
# lny_summary is the observed mean (log-transformed)
pm.CustomDist("conc_summary", lnconc_summary, sigma_summary, Ni_summary, lnsd2_summary, logp=lognormal_logp, random=_lognormal_random, observed=lny_summary
Unfortunately, this implementation gets hung up on the logp
function with:
TypeError: logp() takes 2 positional arguments but 3 were given
It would appear I’m not able to pass in the additional parameters (Ni_summary
, lnsd2_summary
) to _lognormal_logp
, but perhaps I’m wrong in this interpretation.
2. Define custom log-normal distribution
The alternative approach defines a custom log-normal distribution with the desired logp
using:
class CustomLogNormal(pm.Continuous):
def __init__(self, mu, sigma, Ni_summary, lnsd2_summary, *args, **kwargs):
super(CustomLogNormal, self).__init__(*args, **kwargs)
self.mu = mu = pt.as_tensor_variable(mu)
self.sigma = sigma = pt.as_tensor_variable(sigma)
self.N = np.sum(Ni_summary)
self.Ni_summary = Ni_summary = pt.as_tensor_variable(Ni_summary) # Number of animals per time point
self.lnsd2_summary = lnsd2_summary = pt.as_tensor_variable(lnsd2_summary) # Measured variance (in log-space)
def logp(self, x):
term1 = (self.Ni_summary*value)
term2 = (self.Ni_summary/2.)*pm.math.log(self.sigma**2)
term3 = ((self.Ni_summary - 1)*self.lnsd2_summary)/(2*self.sigma**2)
term4 = (self.Ni_summary*(value - self.mu)**2)/(2*self.sigma**2)
LL = (self.N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
return LL
def random(self, mu, sigma, size):
return pm.LogNormal.dist(mu, sigma=sigma, size=size)
Then within the model, I try to define the observed data with
with pm.Model as model:
" <All the required model specs>"
CustomLogNormal('conc_summary', mu=lnconc_summary, sigma=sigma_summary, Ni_summary=Ni_summary, lnsd2_summary=lnsd2_summary, observed=lny_summary)
This implementation give the error:
TypeError: dist() missing 1 required positional argument: 'dist_params'
This dist_params
requirement might be new and I haven’t been able to figure out how to satisfy it.
Any help here would be greatly appreciated. Ultimately, I would like to be able to use the custom logp for the summary-level data in a log-normal distribution and then run a posterior predictive check on the summary-level predictions. Appending the logp with pm.Potential
works really nicely so let me know if expanding to a ppc for summary-level data isn’t feasible.
Thanks again for all your help.