Custom Distribution with pm.CustomDist


For this question, I’m currently using PyMC v5.0.1.

This is an extension from a previous topic, but I was hoping to get some advice on using PyMC’s pm.CustomDist (or something similar) to define the likelihood such that a posterior predictive check (ppc) can be run on the custom likelihood. I understand how pm.Potential can be used to tack on the custom logp, but as has been outlined in numerous topics, you can’t do anything more after that.

To recap, I have summary-level concentration data (mean and standard deviation) reported from a study and I’d like to fit these data with a custom log-normally distributed log-likelihood function. The specifics of that function are in the previous topic. I’ve attempted this two different ways as outlined below.

1. Use pm.CustomDist

First, I attempted to follow the documentation on pm.CustomDist to manually define the logp and random functions:

import pymc as pm
from pytensor.tensor import TensorVariable

def _lognormal_logp(value: TensorVariable, mu: TensorVariable, sigma: TensorVariable, Ni_summary: TensorVariable, lnsd2_summary: TensorVariable) -> TensorVariable:
    N = np.sum(Ni_summary)
    term1 = (Ni_summary*value)
    term2 = (Ni_summary/2.)*pm.math.log(sigma**2)
    term3 = ((Ni_summary - 1)*lnsd2_summary)/(2*sigma**2)
    term4 = (Ni_summary*(value - mu)**2)/(2*sigma**2)
    LL = (N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
    return LL

def _lognormal_random(mu: TensorVariable, sigma: TensorVariable, size: TensorVariable) -> TensorVariable:
    return pm.LogNormal.dist(mu, sigma=sigma, size=size)

within pm.Model, I use pm.CustomDist:

with pm.Model as model:
   " <All the required model specs>"

    # lnconc_summary is the model predicted concentrations (log-transformed) for the summary-level data
    # sigma_summary is the estimated model error for the summary level data (log-space)
    # Ni_summary is the number of animals contributing to each observed summary-level data point
    # lnsd2_summary is the observed variance (log-transformed)
    # lny_summary is the observed mean (log-transformed)
    pm.CustomDist("conc_summary", lnconc_summary, sigma_summary, Ni_summary, lnsd2_summary, logp=lognormal_logp, random=_lognormal_random, observed=lny_summary

Unfortunately, this implementation gets hung up on the logp function with:
TypeError: logp() takes 2 positional arguments but 3 were given

It would appear I’m not able to pass in the additional parameters (Ni_summary, lnsd2_summary) to _lognormal_logp, but perhaps I’m wrong in this interpretation.

2. Define custom log-normal distribution

The alternative approach defines a custom log-normal distribution with the desired logp using:

class CustomLogNormal(pm.Continuous):
    def __init__(self, mu, sigma, Ni_summary, lnsd2_summary, *args, **kwargs):
        super(CustomLogNormal, self).__init__(*args, **kwargs) = mu = pt.as_tensor_variable(mu)
        self.sigma = sigma = pt.as_tensor_variable(sigma)
        self.N = np.sum(Ni_summary)
        self.Ni_summary = Ni_summary = pt.as_tensor_variable(Ni_summary) # Number of animals per time point
        self.lnsd2_summary = lnsd2_summary = pt.as_tensor_variable(lnsd2_summary) # Measured variance (in log-space)

    def logp(self, x):
        term1 = (self.Ni_summary*value)
        term2 = (self.Ni_summary/2.)*pm.math.log(self.sigma**2)
        term3 = ((self.Ni_summary - 1)*self.lnsd2_summary)/(2*self.sigma**2)
        term4 = (self.Ni_summary*(value -**2)/(2*self.sigma**2)
        LL = (self.N/2)*pm.math.log(2*np.pi) - pm.math.sum((term1 + term2 + term3 + term4))
        return LL

    def random(self, mu, sigma, size):
        return pm.LogNormal.dist(mu, sigma=sigma, size=size)

Then within the model, I try to define the observed data with

with pm.Model as model:
   " <All the required model specs>"
   CustomLogNormal('conc_summary', mu=lnconc_summary, sigma=sigma_summary, Ni_summary=Ni_summary, lnsd2_summary=lnsd2_summary, observed=lny_summary)

This implementation give the error:
TypeError: dist() missing 1 required positional argument: 'dist_params'

This dist_params requirement might be new and I haven’t been able to figure out how to satisfy it.

Any help here would be greatly appreciated. Ultimately, I would like to be able to use the custom logp for the summary-level data in a log-normal distribution and then run a posterior predictive check on the summary-level predictions. Appending the logp with pm.Potential works really nicely so let me know if expanding to a ppc for summary-level data isn’t feasible.

Thanks again for all your help.

The CustomDist random function should accept the 2 summary parameters just like the logp function, even if it doesn’t use them.

Thanks for response @ricardoV94 and you’re exactly right about passing the arguments from pm.CustomDist into the custom logp and random functions. I’ve learned now that all variables after the name in pm.CustomDist are passed into the logp and random functions so I needed all of them to be available as inputs in the custom functions, even if those variables aren’t all used.

I updated the code with:

def _lognormal_logp(value, mu, sigma, Ni_summary, lnsd2_summary):
    # First input is observed value
    term1 = (Ni_summary*value)
    term2 = (Ni_summary/2.)*pm.math.log(sigma**2)
    term3 = ((Ni_summary - 1)*lnsd2_summary)/(2*sigma**2)
    term4 = (Ni_summary*(value - mu)**2)/(2*sigma**2)
    logp = (Ni_summary/2)*pm.math.log(2*np.pi) - (term1 + term2 + term3 + term4)
    return logp

def _lognormal_random(mu, sigma, Ni_summary, lnsd2_summary, rng=None, size=None):
    # Need Ni_summary and lnsd2_summary even though they aren't used
    return rng.normal(loc=mu, scale=sigma, size=size)

pm.CustomDist("conc_summary", lnconc_summary, sigma_summary, Ni_summary, lnsd2_summary, logp=_lognormal_logp, random=_lognormal_random, observed=lny_summary

And now everything runs nicely with pm.CustomDist and I’m able to rum pm.sample_posterior_predictive as well.