NoneType error on custom distribution

I’m using the custom SSBeta distribution defined here.

I believe this used to work on pymc3, I’m now getting an error using pymc5.

class SSBeta(pm.Beta):
    def __init__(self, alpha, beta, loc, scale, *args, **kwargs):

        transform = pm.distributions.transforms.lowerbound(loc)

        super().__init__(alpha=alpha, beta=beta, *args, **kwargs, transform=transform)

        self.scale = scale
        self.loc = loc
        self.mean += loc

    def random(self, point=None, size=None):
        return super().random()*self.scale + self.loc

    def logp(self, x):
        return super().logp((x - self.loc)/self.scale)

with pm.Model() as model:

    alpha = pm.Uniform('alpha', 0, 10)
    beta = pm.Uniform('beta', 0, 10)

    pred = SSBeta("pred",
                  alpha = alpha,
                  beta = beta,
                  loc = 2e6,
                  scale = 1.5e5,
                  observed = np.array([2.2e6, 2.08e6, 2.01e6, 2.15e6]))

    trace = pm.sample()
TypeError: 'NoneType' object is not callable

The internals have changed. You can see an example of how to implement a custom distribution here.

Since the link you provided seems to be geared to developers, I researched alternative approaches and found the CustomDist function. It seemed like using CustomDist would be the simplest approach. I wrote out the entire model for my actual use-case using a mixture model based on a scaled and shifted beta, another beta and a shifted gamma distribution. The sample_prior_predictive results looked good. However, the sample_posterior_predictive results have a lot of negative values where I don’t have negative values either in the priors or in the observed data.

I noticed that CustomDist has a warning that it’s experimental and that I should expect bugs. Is this one of those scenarios or is something likely off in my parameterization?

Note I only have 6 datapoints in my observed data. I’m re-running now by just duplicating those 6 datapoints 15 times to see if that changes anything, but not hopeful it will.

Can you share yoir model using the CustomDist? It’s possible your posterior is more aligned with negative values than your prior. Does your CustomDist explicitly forbid negative values?

Here it is. The CustomDist doesn’t explicitly forbid negative values but the prior predictive plot shows that there are no sampled negative values.

import numpy as np
import plotly.express as px
import pymc as pm

def dist_beta(a, b, loc, scale,size):
    return pm.Beta.dist(a, b, size=size) * scale + loc

def dist_gamma(a,b,loc,size):
    return pm.Gamma.dist(a, b, size = size) + loc

def logp_beta(x, a, b, loc, scale):
    return (x - loc)/scale

def logp_gamma(x, a, b, loc):
    return x - loc

sd_mult = 0.01

with pm.Model() as m:
    # Left beta
    left_a = pm.Normal('left_a', 1.70, 1.70 * sd_mult )
    left_b = pm.Normal('left_b', 1.11, 1.11 * sd_mult )
    left_loc = pm.Normal('left_loc', 2.0e6, 2.0e6 * sd_mult )
    left_scale = pm.Normal('left_scale', 1.57e5, 1.57e5 * sd_mult )
    
    # Middle beta
    middle_a = pm.Normal('middle_a', 0.75, 0.75 * sd_mult )
    middle_b = pm.Normal('middle_b', 1.11, 1.11 * sd_mult )
    middle_loc = pm.Normal('middle_loc', 2.16e6, 2.16e6 * sd_mult )
    middle_scale = pm.Normal('middle_scale', 8.6e4, 8.6e4 * sd_mult )

    gamma_a = pm.Normal('gamma_a', 344.92, 344.92 * sd_mult )
    gamma_b = pm.Normal('gamma_b', 1 / 1045, (1 / 1045) * sd_mult )
    gamma_loc = pm.Normal('gamma_loc', 1.92e6, 1.92e6 * sd_mult)

    left_beta = pm.CustomDist.dist(
        left_a,
        left_b,
        left_loc,
        left_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='left_beta'
    )
    
    middle_beta = pm.CustomDist.dist(
        middle_a,
        middle_b,
        middle_loc,
        middle_scale,
        dist=dist_beta,
        logp = logp_beta,
        class_name='middle_beta'
    )
    
    right = pm.CustomDist.dist(
        gamma_a,
        gamma_b,
        gamma_loc,
        dist=dist_gamma,
        logp = logp_gamma,
        class_name='gamma'
    )

    w = pm.Dirichlet('w', np.ones(3))
    
    data = np.array([2.207e6, 2.064e6, 2.156e6, 2.057e6, 2.089e6, 2.205e6])
    resp = pm.Mixture('resp', w = w, comp_dists=[left_beta, middle_beta, right], observed=data)

    trace = pm.sample()

with m:
    prior = pm.sample_prior_predictive()
    post = pm.sample_posterior_predictive(trace)

Prior predictive samples look good:

px.histogram(prior.prior_predictive.resp.to_numpy().reshape(-1))

But the posteriors are strange:

px.histogram(post.posterior_predictive.resp.to_numpy().reshape(-1))

Those logp function seem completely wrong. That means the posterior won’t be equivalent to the prior and therefore posterior_predictive will also not be equivalent to prior predictive.

By the way with those dist functions you don’t actually need to provide a logp function because PyMC can figure it out for you.

The class_name is not needed in the most recent versions either.

I tried commenting out the logp argument as an experiment and get this error:

NotImplementedError: Logprob method not implemented for Elemwise{add,no_inplace}

If I don’t provide the logp function, how does PyMC know what to figure out? I’m trying to specify that there is a loc and scale parameter for the beta distributions and a loc parameter for the gamma. I used the CustomDist API example as a template:

from typing import Optional, Tuple

import numpy as np
import pymc as pm
from pytensor.tensor import TensorVariable

def logp(value: TensorVariable, mu: TensorVariable) -> TensorVariable:
    return -(value - mu)**2

Please provide pointers on how I should modify this.

The class_name issue is much less important, but I’m running 5.3.0 and I included it because I was getting an error without it.

Ah it seems CustomDist with inferred logp don’t work inside Mixture yet.

As a workaround, if you want to use PyMC ability to figure out logp, you can implement your logp function as:

def logp_beta(x, a, b, loc, scale):
  return pm.logp(dist_beta(a, b, loc, scale, size=x.shape), x)

About how PyMC does it. It’s basically some change of variables. The logp(RV * scale, x) = logp(RV, x/scale) - log(scale). PyMC knows a couple of these rules.

import numpy as np
import pymc as pm


def dist_beta(a, b, loc, scale, size):
    return pm.Beta.dist(a, b, size=size) * scale + loc


def dist_gamma(a, b, loc, size):
    return pm.Gamma.dist(a, b, size=size) + loc


def logp_beta(x, a, b, loc, scale):
    return pm.logp(dist_beta(a, b, loc, scale, size=x.shape), x)
    

def logp_gamma(x, a, b, loc):
    return pm.logp(dist_gamma(a, b, loc, size=x.shape), x)


sd_mult = 0.01

with pm.Model() as m:
    # Left beta
    left_a = pm.Normal('left_a', 1.70, 1.70 * sd_mult)
    left_b = pm.Normal('left_b', 1.11, 1.11 * sd_mult)
    left_loc = pm.Normal('left_loc', 2.0e6, 2.0e6 * sd_mult)
    left_scale = pm.Normal('left_scale', 1.57e5, 1.57e5 * sd_mult)

    # Middle beta
    middle_a = pm.Normal('middle_a', 0.75, 0.75 * sd_mult)
    middle_b = pm.Normal('middle_b', 1.11, 1.11 * sd_mult)
    middle_loc = pm.Normal('middle_loc', 2.16e6, 2.16e6 * sd_mult)
    middle_scale = pm.Normal('middle_scale', 8.6e4, 8.6e4 * sd_mult)

    gamma_a = pm.Normal('gamma_a', 344.92, 344.92 * sd_mult)
    gamma_b = pm.Normal('gamma_b', 1 / 1045, (1 / 1045) * sd_mult)
    gamma_loc = pm.Normal('gamma_loc', 1.92e6, 1.92e6 * sd_mult)

    left_beta = pm.CustomDist.dist(
        left_a,
        left_b,
        left_loc,
        left_scale,
        dist=dist_beta,
        logp=logp_beta,
        class_name='left_beta'
    )

    middle_beta = pm.CustomDist.dist(
        middle_a,
        middle_b,
        middle_loc,
        middle_scale,
        dist=dist_beta,
        logp=logp_beta,
        class_name='middle_beta'
    )

    right = pm.CustomDist.dist(
        gamma_a,
        gamma_b,
        gamma_loc,
        dist=dist_gamma,
        logp=logp_gamma,
        class_name='gamma'
    )

    w = pm.Dirichlet('w', np.ones(3))

    data = np.array([2.207e6, 2.064e6, 2.156e6, 2.057e6, 2.089e6, 2.205e6])
    resp = pm.Mixture('resp', w=w, comp_dists=[left_beta, middle_beta, right], observed=data)

Thanks. Running this now - been running for 45 minutes. There are 11 parameters, so maybe it takes some time, but 45 minutes+ seems a lot. Not sure whether it’s stuck somewhere in the sampling.

I think your model is very poorly specified, too many parameters, wide priors and very few data points. You may also want to normalize your data so it’s closer to the unit scale to avoid float point precision issues

Not sure the priors are so wide.

from scipy import stats

smp = stats.norm.rvs(1.7, 1.7 * 0.01, 10000)
print(smp.min(), smp.max())

1.647 1.766

When you say too many parameters, what do you propose? I have a trimodal distribution that I’m modeling with 2 betas and one gamma. Each distr has 3-4 parameters. Open to ideas on how to simplify.

Point taken regarding normalizing the data. Regarding few datapoints in the observed data, I thought the implication of few points in the observed data is just that posterior distribution won’t be that different from the prior predictive.

It completed in 200 minutes with some divergences, but the posterior result is no different from the formulation I posted earlier with the incorrect logp.