Thanks for the reply. Wouldn’t that only work when n is sufficiently large? Not sure how to pinpoint the threshold but values around 10-20 seem a bit low. Also did you mean:
llike = pm.Lognormal("llike", mu=mu + pm.math.log(n*p), sigma=sd, observed=data)
The real data generating mechanism is slightly more complex, I choose the one above as it seemed like the closest simplest case. The real data generating mechanism is more similar to below:
import numpy as np
import matplotlib.pyplot as plt
import pymc3 as pm
def sigmoid(d, t):
return 1 - 1/(1+np.exp((t-d)))
sd = 0.4
treal = np.random.randint(0,6)
preal = [sigmoid(d,treal) for d in range(9)]
nreal = 20
s = [np.random.binomial(20,p) for p in preal] # this is what one would observe if we did not have the noise below
pnoisy = [sigmoid(np.random.lognormal(np.log(d), sd), treal) for d in range(9)]
nnoisy = np.random.lognormal(np.log(20), sd, size=(9,))
sobserved = [np.random.binomial(n,p) for n,p in zip(nnoisy,pnoisy)]
The goal is to fit a sigmoid curve to data that somehow look like above and estimate treal while getting also a realiable estimate of the uncertainty when n can be quite low (say <10). That is why I feel like I should somehow stick to Binomial without resorting to its normal approximation. Now having looked at this model though, it is not a noise on the final variable its self but more like noise on some independent variables. So trying a model like below does give some what sensical result for the data above (modulo alot of divergences):
with pm.Model() as model:
t = pm.Uniform('t',0, 8)
sd = pm.Exponential('sd_dilution', 1)
xvals = pm.Normal('xvals', xvals, sd, shape=len(xvals))
μ = (xvals - t)
θ = pm.Deterministic('θ', 1 + -1*pm.math.sigmoid(μ)) # sigmoid(x)=1/(1+exp(-x))
max_count = pm.Normal('max_count', 20, 0.2)
p = θ
pm.Binomial(f'obs', max_count, p,
shape=(9,),
observed=sobserved)
trace = pm.sample(5000, tune=5000, target_accept=0.95,
chains=6, cores=6,
start={'max_count':np.max(sobserved)})