Binomial Regression with Lognormal Noise

I am trying to expand the model in the page

https://docs.pymc.io/en/v3/pymc-examples/examples/generalized_linear_models/GLM-binomial-regression.html

by adding a log normal noise to the Binomial distribution. I feel like this might be problematic as I am adding a continous RV to a discrete one. Nevertheless, I tried some of the most straight forward approaches which seem to produce only partially sensical results. The way I modify the generated data is as follows:

# Generate data
y = rng.binomial(n, p_true)
# This is the extra step
y =np.round(rng.lognormal(np.log(y), 0.2, size=y.size))

I believe that the question I have at hand can be modelled with something similar to this and this is why I am trying this. The full code for data generation and modelling is below (the model is a modified version of what is in the page above):

import numpy as np
import pymc3 as pm
import arviz as az
import pandas as pd
from scipy.special import expit

rng = np.random.default_rng(1234)
# true params
β0_true = 0.7
β1_true = 0.4
# number of yes/no questions
n = 20

sample_size = 30
x = np.linspace(-10, 20, sample_size)
# Linear model
μ_true = β0_true + β1_true * x
# transformation (inverse logit function = expit)
p_true = expit(ÎĽ_true)
# Generate data
y = rng.binomial(n, p_true)

y = np.round(y*rng.lognormal(0,0.2,size=y.size))
# bundle data into dataframe
data = pd.DataFrame({"x": x, "y": y})


coords = {"observation": data.index.values}

with pm.Model(coords=coords) as binomial_regression_model:

    # priors
    β0 = pm.Normal("β0", mu=0, sigma=1)
    β1 = pm.Normal("β1", mu=0, sigma=1)
    sd = pm.Exponential("sd", 1)
    # linear model
    μ = β0 + β1 * data["x"]
    p = pm.Deterministic("p", pm.math.invlogit(ÎĽ))
    # likelihood
    yb = pm.Binomial("yb", n=n, p=p, shape=sample_size)

    pm.Normal("obs", pm.math.log(yb+0.001), sd, observed=pm.math.log(data["y"]+0.001).T,
              shape=sample_size)

with binomial_regression_model:
    trace = pm.sample(5000, tune=10000, return_inferencedata=True,
                      nuts={'target_accept':0.95}, chains=6,cores=6)

az.plot_trace(trace, var_names=["β0", "β1", "sd"])

summary = az.summary(trace, var_names=["β0", "β1", "sd"])

It seems to get a good estimate on sd and B1 but B0 is close to 0 (and convergence is problematic too). Obviously letting go off the normal term and trying to put observed into the Binomial would be a bit problematic too since I would need to modify n to be the maximum of y which could be quite high compared to the real value. I am not always in the domain where normal approximation to the binomial is valid (which would be more natural to use here if it were). Another interesting observation is that if I increase sample_size to say 300 both B0 and B1 come out as 0 and sd comes out as 2 (here there are no problems in convergence and estimates have very narrow HDIs). Am I trying to something non-sensical here? Thanks

What you are trying to do is definitely unusual. The biggest disadvantage is that by introducing a discrete unobserved variable your model can no longer be sampled with NUTS alone.

Further more, conceptually, you are piling two forms of noise on top of each other (the binomial had built in uncertainty already)

I don’t understand why n would be a problem for the likelihood but not for the unobserved variable. How big of an n are we talking about anyway? The binomial should handle pretty large values.

You can also consider less flexible forms like the Poisson or more flexible ones like the negative binomial.

Thanks. n would be a problem in the sense that real n is 20 but y is inflated by the noise and if I tried to do something just with Binomial of the form pm.Binomial(“obs”, n, p, observed=y_with_noise), I wouldn’t be able to use n=20 because sometimes observed values would be larger than 20 leading to probably -inf value in log likelihood evaluation. One way around would be to set n to be max of y but then that would likely create a bias in the model coming from an outlier y value with a large noise. n could be anywhere between 1000 to 1 basically. So I kept the original 20 in the page as it is not too small but also probably not large enough for a reasonable normal approximation.

I see, why not model the lognormal directly with the mean of the binomial: n * p or np.log(n) + np.log(p) on the logscale?

llike = pm.Lognormal("llike", mu=mu + pm.math.log(n), sigma=sd, observed=data)

Or is the data generating process really the one with a noisy binomial? In that case you can model it like that of course (with the caveats you can’t use NUTS alone)

Thanks for the reply. Wouldn’t that only work when n is sufficiently large? Not sure how to pinpoint the threshold but values around 10-20 seem a bit low. Also did you mean:

llike = pm.Lognormal("llike", mu=mu + pm.math.log(n*p), sigma=sd, observed=data)

The real data generating mechanism is slightly more complex, I choose the one above as it seemed like the closest simplest case. The real data generating mechanism is more similar to below:

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

def sigmoid(d, t):
    return 1 - 1/(1+np.exp((t-d)))


sd = 0.4

treal = np.random.randint(0,6)
preal = [sigmoid(d,treal) for d in range(9)]
nreal = 20
s = [np.random.binomial(20,p) for p in preal] # this is what one would observe if we did not have the noise below

pnoisy = [sigmoid(np.random.lognormal(np.log(d), sd), treal) for d in range(9)]
nnoisy = np.random.lognormal(np.log(20), sd, size=(9,))
sobserved = [np.random.binomial(n,p) for n,p in zip(nnoisy,pnoisy)]

The goal is to fit a sigmoid curve to data that somehow look like above and estimate treal while getting also a realiable estimate of the uncertainty when n can be quite low (say <10). That is why I feel like I should somehow stick to Binomial without resorting to its normal approximation. Now having looked at this model though, it is not a noise on the final variable its self but more like noise on some independent variables. So trying a model like below does give some what sensical result for the data above (modulo alot of divergences):


with pm.Model() as model:

    t = pm.Uniform('t',0, 8)
    sd = pm.Exponential('sd_dilution', 1)

    xvals = pm.Normal('xvals', xvals, sd, shape=len(xvals))

    ÎĽ = (xvals - t)
    θ = pm.Deterministic('θ', 1 + -1*pm.math.sigmoid(μ)) # sigmoid(x)=1/(1+exp(-x))

    max_count = pm.Normal('max_count', 20, 0.2)
    p = θ 

    pm.Binomial(f'obs', max_count, p,
                shape=(9,),
                observed=sobserved)

    trace = pm.sample(5000, tune=5000, target_accept=0.95,
                      chains=6, cores=6,
                      start={'max_count':np.max(sobserved)})

No, I meant the original form. The mean of the Lognormal is the mean of a latent Normal that is then exponentiated. Conversely, the log of a Binomial expectation would be log(n * p) or log(n) + log(p) and log(p) was kind of what you had by the mu already. It was a bit of a simplification because you used a logit transformation instead of a log transformation in your original example but I think it should work the same, just with a slightly different meaning.


Anyway, taking a step back this is what I would have tried first:

with pm.Model() as m:
  logit_p = ...
  n_extra = pm.HalfNormal("n_extra")
  llike = pm.Binomial("llike", n=data + n_extra, logit_p=p, observed=data)

The problem with your data-generation is that you have too many sources of randomness. A model that mimics it perfectly is probably going to be undetermined (usually this yields divergences or high r_hat). The same would happen if you had a model where you nest 3 gaussian errors. You wouldn’t be able to sample something like

with pm.Model() as m:
  x = pm.Normal("x")
  y = pm.Normal("y", x)
  llike = pm.Normal("llike", y, observed=data)

Because there are multiple ways x and y can be combined to generate the same final data. So even with a true model like that, you would remove one of the unobserved noises. You need more structure to model different levels of unobserved noise.

My advice would be start with a simpler model, even if “innacurate”, and iterate from there. There is a point where you won’t be able to add more complexity and it will be more clear why.

1 Like

Thanks for the insight and the answer, it was quite helpful.

2 Likes