Binomial Regression with Lognormal Noise

Thanks for the reply. Wouldn’t that only work when n is sufficiently large? Not sure how to pinpoint the threshold but values around 10-20 seem a bit low. Also did you mean:

llike = pm.Lognormal("llike", mu=mu + pm.math.log(n*p), sigma=sd, observed=data)

The real data generating mechanism is slightly more complex, I choose the one above as it seemed like the closest simplest case. The real data generating mechanism is more similar to below:

import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm

def sigmoid(d, t):
    return 1 - 1/(1+np.exp((t-d)))


sd = 0.4

treal = np.random.randint(0,6)
preal = [sigmoid(d,treal) for d in range(9)]
nreal = 20
s = [np.random.binomial(20,p) for p in preal] # this is what one would observe if we did not have the noise below

pnoisy = [sigmoid(np.random.lognormal(np.log(d), sd), treal) for d in range(9)]
nnoisy = np.random.lognormal(np.log(20), sd, size=(9,))
sobserved = [np.random.binomial(n,p) for n,p in zip(nnoisy,pnoisy)]

The goal is to fit a sigmoid curve to data that somehow look like above and estimate treal while getting also a realiable estimate of the uncertainty when n can be quite low (say <10). That is why I feel like I should somehow stick to Binomial without resorting to its normal approximation. Now having looked at this model though, it is not a noise on the final variable its self but more like noise on some independent variables. So trying a model like below does give some what sensical result for the data above (modulo alot of divergences):


with pm.Model() as model:

    t = pm.Uniform('t',0, 8)
    sd = pm.Exponential('sd_dilution', 1)

    xvals = pm.Normal('xvals', xvals, sd, shape=len(xvals))

    μ = (xvals - t)
    θ = pm.Deterministic('θ', 1 + -1*pm.math.sigmoid(μ)) # sigmoid(x)=1/(1+exp(-x))

    max_count = pm.Normal('max_count', 20, 0.2)
    p = θ 

    pm.Binomial(f'obs', max_count, p,
                shape=(9,),
                observed=sobserved)

    trace = pm.sample(5000, tune=5000, target_accept=0.95,
                      chains=6, cores=6,
                      start={'max_count':np.max(sobserved)})