Censored Binomial GLM: Sampling fails

mschmidt87 · February 16, 2023, 7:54am

Hi,
I am trying to build a Censored Binomial GLM to model purchasing data of a product with limited capacity.
Let’s say, we have the following data-generating process:

We have a product with limited capacity, i.e., there is only a limited number of items available
We have a varying number of customers interested in the product. Each customers buys the product with a certain probability
That probability depends on some explanatory variable (e.g. the price of the product) with a sigmoidal dependency, e.g. error function.

I would like to model this with a Binomial GLM that is censored with the upper limit being the number of available items.

Here is script simulating some data and the model definition:

import pymc as pm
import numpy as np
import pandas as pd
import altair as alt
from scipy.special import erfc

rng = np.random.RandomState(123)

# Simulate some data
capacity = 50
N = 1000 # number of samples
lam = 100
customers = rng.poisson(lam=lam, size=N)
x = np.random.uniform(low=0, high=1, size=N)
a = 10.0
b = -5.0

p = 0.5 * erfc(a*x + b)

y = np.clip(rng.binomial(n=customers, p=p), 0, capacity)

# Plotting with altair
data = pd.DataFrame({'x': x, 'p': p, 'customers': customers, 'y': y})
alt.Chart(data).mark_circle().encode(x='x', y='y')

The data looks like this:

censored_binomial_GLM_data

Model definition

with pm.Model() as censored_binomial_glm:
    # Model parameters
    a = pm.Normal('a', 0.0, 10.0)
    b = pm.Normal('b', 0.0, 1.0)
    lam = pm.Exponential('lam', 0.1)

    p = 0.5 * pm.math.erfc(a*x + b)
    # We assume that the customer number is also observed
    N = pm.Poisson('num_customers', mu=lam, observed=customers)
    binomial_dist = pm.Binomial.dist(n=N, p=p)
    pm.Censored("y", binomial_dist, lower=0, upper=capacity, observed=y)

Then I try to fit the model

with censored_binomial_glm:
    trace = pm.sample(target_accept=0.999)

and up with a lot of divergences. I verified that the model works well for a very high capacity where we’re basically never hitting maximum capacity and the censoring doesn’t play any role.

Does anyone see an error in the model definition or modeling approach or do you have an advice on how to improve the parametrization, for instance?
Thanks in advance.

ricardoV94 · February 16, 2023, 8:05am

One issue I see os that N could be below the observations which would be impossible

mschmidt87 · February 17, 2023, 12:20am

Thank you for your reply. Actually, I don’t see how N could be below the observations in the model. The binomial distribution can also sample values <=N, can’t it?

cluhmann · February 17, 2023, 2:16am

I think @ricardoV94 meant that allowing N to be a random variable could lead to situations (during sampling) in which N was less than some observed value. So then your model might, roughly speaking, ask what binomial p would yield 12 successes out of 10 attempts.

mschmidt87 · February 17, 2023, 5:22am

Right, thank you, now I understood.
However, wouldn’t that be the case for any binomial GLM even without censoring? Because, without censoring and no clipping in the data, I can fit a Binomial GLM without problems.

In any case, I simplified the model to directly feed the observations into the binomial distribution, like this:

with pm.Model() as censored_binomial_glm:
    a = pm.Normal('a', 0.0, 10.0)
    b = pm.Normal('b', 0.0, 1.0)
    p = 0.5 * pm.math.erfc(a*x + b)
    binomial_dist = pm.Binomial.dist(n=customers, p=p)
    pm.Censored("y", binomial_dist, lower=0, upper=capacity, observed=y)

This doesn’t solve my problem with fitting the model, however. I still get a lot of divergences:

with censored_binomial_glm:
    trace = pm.sample(tune=2000, target_accept=0.999)

Leads to this:

ricardoV94 · February 17, 2023, 8:08am

I don’t think binomial glms with random N are very common. Have you considered a Poisson or Negative Binomial parametrized in terms of the mean and standard deviation instead?

mschmidt87 · February 21, 2023, 3:22am

Thank you for the idea. I tried to use a Censored Poisson GLM with the following code:

with pm.Model() as poisson_glm_censored:
    a = pm.HalfNormal('a', 10.0)
    b = pm.Normal('b', 0.0, 1.0)
    beta = lam * (0.5 * pm.math.erfc(a*x + b))
    poisson_dist = pm.Poisson.dist(beta)
    pm.Censored("y", poisson_dist, lower=0, upper=capacity, observed=y)

The prior simulations do look reasonable as far as I can see:

censored_poisson_prior
(blue dots = data points)

But the sampling completely fails, unfortunately:

with poisson_glm_censored:
    trace_censored = pm.sample(tune=4000, step=[pm.NUTS(target_accept=0.9,max_treedepth=30)])

(I started with default parameters for pm.sample() and tried different sets of parameters all leading to bad sampling results.)

Topic		Replies	Views
Posterior inference on a continuous variable with boolean censoring version agnostic	4	460	December 5, 2022
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	385	August 30, 2022
SamplingError In Binomial Model. Practice Problem 1M21 From BMCP Book	2	301	May 10, 2023
Statistical Rethinking Ch 3 Q1 Code / observed = data errors / beginner question v3 modeling	6	442	June 2, 2022
Advice on Modeling Censored Data from a Ratio of Variables version agnostic	1	20	March 13, 2025

Censored Binomial GLM: Sampling fails

Related topics