Bayesian inference of N of a binomial distribution

Hello, there!

I am trying to replicate the model built by Raftery in the article Inference for the binomial N parameter: A hierarchical Bayes approach using PyMC3.

Problem definition

Let (x_1, ...., x_n) be a sample from Binomial distribution with unknown N and p. That is, x_i | N, \theta \sim Bin(N,\theta), for i = 1,...,n. Assume that N has prior Poisson distribution with mean \mu. Then, as discussed in the paper, x_1, ..., x_n have Poisson distribution with mean \lambda = \mu\cdot \theta.

The paper specifies the improper prior \pi(\lambda, \theta) = \lambda^{-1}, and the proper with \lambda \sim Gamma(\kappa_1, \kappa_2) and \theta \sim Unif(0,1) independents.

Remark: When we consider one sample and the improper prior, for instance, the median of the posterior distribution N|x is 2x.

Application

For instance, suppose we observe \boldsymbol{x} = [15, 20, 21, 23, 26]. What inferecences can we make about N?

What I tried to do

Consider the code:

impala = np.array([15,20, 21, 23, 26])

with pm.Model() as model: 
    
    theta = pm.Beta("theta", 1,1)
    lambd = pm.Gamma("lambda",1,1)
    
    mu = lambd/theta
    
    BoundPoisson = pm.Bound(pm.Poisson, lower = max(impala))
    N = BoundPoisson("N", mu = mu)
    
    x = pm.Binomial("x", n=N, p=theta, observed=impala)
    
    trace = pm.sample(8000, tune = 8000)
    
    display(az.summary(trace))

I calculated \mu using the expression \lambda = \mu\cdot\theta and we bounded the variable N since N \ge x_{max}.

From that, I got the messages

Sampling 2 chains for 8_000 tune and 8_000 draw iterations (16_000 + 16_000 draws total) took 25 seconds.
The acceptance probability does not match the target. It is 0.94 but should be close to 0.8. Try to increase the number of tuning steps.
There were 251 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.66 but should be close to 0.8. Try to increase the number of tuning steps.
The estimated number of effective samples is smaller than 200 for some parameters.
mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	

I could not it when the prior is an improver, that is, \lambda \sim Gamma(0,0).

Remark: The number of divergences changes each time.

technically you need to consider N as another latent variable, and then do binomial from it. this will result in mixture model
you may look here Integer parameters - #3 by ellie.nou13 - Modeling - The Stan Forums