# Custom Distribution with pm.CustomDist

Hello,

For this question, I’m currently using PyMC v5.0.1.

This is an extension from a previous topic, but I was hoping to get some advice on using PyMC’s `pm.CustomDist` (or something similar) to define the likelihood such that a posterior predictive check (ppc) can be run on the custom likelihood. I understand how `pm.Potential` can be used to tack on the custom logp, but as has been outlined in numerous topics, you can’t do anything more after that.

To recap, I have summary-level concentration data (mean and standard deviation) reported from a study and I’d like to fit these data with a custom log-normally distributed log-likelihood function. The specifics of that function are in the previous topic. I’ve attempted this two different ways as outlined below.

# 1. Use pm.CustomDist

First, I attempted to follow the documentation on `pm.CustomDist` to manually define the `logp` and `random` functions:

``````import pymc as pm
from pytensor.tensor import TensorVariable

def _lognormal_logp(value: TensorVariable, mu: TensorVariable, sigma: TensorVariable, Ni_summary: TensorVariable, lnsd2_summary: TensorVariable) -> TensorVariable:
N = np.sum(Ni_summary)
term1 = (Ni_summary*value)
term2 = (Ni_summary/2.)*pm.math.log(sigma**2)
term3 = ((Ni_summary - 1)*lnsd2_summary)/(2*sigma**2)
term4 = (Ni_summary*(value - mu)**2)/(2*sigma**2)
LL = (N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
return LL

def _lognormal_random(mu: TensorVariable, sigma: TensorVariable, size: TensorVariable) -> TensorVariable:
return pm.LogNormal.dist(mu, sigma=sigma, size=size)
``````

within `pm.Model`, I use `pm.CustomDist`:

``````with pm.Model as model:
" <All the required model specs>"

# lnconc_summary is the model predicted concentrations (log-transformed) for the summary-level data
# sigma_summary is the estimated model error for the summary level data (log-space)
# Ni_summary is the number of animals contributing to each observed summary-level data point
# lnsd2_summary is the observed variance (log-transformed)
# lny_summary is the observed mean (log-transformed)
pm.CustomDist("conc_summary", lnconc_summary, sigma_summary, Ni_summary, lnsd2_summary, logp=lognormal_logp, random=_lognormal_random, observed=lny_summary

``````

Unfortunately, this implementation gets hung up on the `logp` function with:
`TypeError: logp() takes 2 positional arguments but 3 were given`

It would appear I’m not able to pass in the additional parameters (`Ni_summary`, `lnsd2_summary`) to `_lognormal_logp`, but perhaps I’m wrong in this interpretation.

# 2. Define custom log-normal distribution

The alternative approach defines a custom log-normal distribution with the desired `logp` using:

``````class CustomLogNormal(pm.Continuous):
def __init__(self, mu, sigma, Ni_summary, lnsd2_summary, *args, **kwargs):
super(CustomLogNormal, self).__init__(*args, **kwargs)
self.mu = mu = pt.as_tensor_variable(mu)
self.sigma = sigma = pt.as_tensor_variable(sigma)
self.N = np.sum(Ni_summary)
self.Ni_summary = Ni_summary = pt.as_tensor_variable(Ni_summary) # Number of animals per time point
self.lnsd2_summary = lnsd2_summary = pt.as_tensor_variable(lnsd2_summary) # Measured variance (in log-space)

def logp(self, x):
term1 = (self.Ni_summary*value)
term2 = (self.Ni_summary/2.)*pm.math.log(self.sigma**2)
term3 = ((self.Ni_summary - 1)*self.lnsd2_summary)/(2*self.sigma**2)
term4 = (self.Ni_summary*(value - self.mu)**2)/(2*self.sigma**2)
LL = (self.N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
return LL

def random(self, mu, sigma, size):
return pm.LogNormal.dist(mu, sigma=sigma, size=size)
``````

Then within the model, I try to define the observed data with

``````with pm.Model as model:
" <All the required model specs>"
CustomLogNormal('conc_summary', mu=lnconc_summary, sigma=sigma_summary, Ni_summary=Ni_summary, lnsd2_summary=lnsd2_summary, observed=lny_summary)
``````

This implementation give the error:
`TypeError: dist() missing 1 required positional argument: 'dist_params'`

This `dist_params` requirement might be new and I haven’t been able to figure out how to satisfy it.

Any help here would be greatly appreciated. Ultimately, I would like to be able to use the custom logp for the summary-level data in a log-normal distribution and then run a posterior predictive check on the summary-level predictions. Appending the logp with `pm.Potential` works really nicely so let me know if expanding to a ppc for summary-level data isn’t feasible.

Thanks again for all your help.

The CustomDist random function should accept the 2 summary parameters just like the logp function, even if it doesn’t use them.

Thanks for response @ricardoV94 and you’re exactly right about passing the arguments from `pm.CustomDist` into the custom logp and random functions. I’ve learned now that all variables after the `name` in `pm.CustomDist` are passed into the logp and random functions so I needed all of them to be available as inputs in the custom functions, even if those variables aren’t all used.

I updated the code with:

``````def _lognormal_logp(value, mu, sigma, Ni_summary, lnsd2_summary):
# First input is observed value
term1 = (Ni_summary*value)
term2 = (Ni_summary/2.)*pm.math.log(sigma**2)
term3 = ((Ni_summary - 1)*lnsd2_summary)/(2*sigma**2)
term4 = (Ni_summary*(value - mu)**2)/(2*sigma**2)
logp = (Ni_summary/2)*pm.math.log(2*np.pi) - (term1 + term2 + term3 + term4)
return logp

def _lognormal_random(mu, sigma, Ni_summary, lnsd2_summary, rng=None, size=None):
# Need Ni_summary and lnsd2_summary even though they aren't used
return rng.normal(loc=mu, scale=sigma, size=size)

pm.CustomDist("conc_summary", lnconc_summary, sigma_summary, Ni_summary, lnsd2_summary, logp=_lognormal_logp, random=_lognormal_random, observed=lny_summary
``````

And now everything runs nicely with `pm.CustomDist` and I’m able to rum `pm.sample_posterior_predictive` as well.

Thanks!