SamplingError In Binomial Model. Practice Problem 1M21 From BMCP Book

Hello all! I took the semester off of school in order to work through Bayesian Modeling and Computation in Python book, which is much more interesting than anything my school teaches :slight_smile:

I hit my first snag while working through practice problem 1M21 in the first chapter. It’s a multi-layer problem, asking to run various combinations of options. I first simplified and chose just one scenario:

“A store is visted by n customers ona given day. The number of customers that make a purchase Y is distributed as Bin(n,theta), where theta is the probability that a customer makes a purchase. Assume we know theta and the prior for n is Pois(4.5). Use PyMC3 to compute the posterior distribution of n for Y=5 and theta=0.5”.

I simplified the last sentence and chose just one scenario, for purposes of this post.

My code is:

import numpy as np
import pymc3 as pm

# We observe 5 purchases.
Y = 5

# Model.
with pm.Model() as model:
    # Specify the prior distribution of unknown parameter n.
    n = pm.Poisson("n", mu=4.5)
    
    # Specify known value of theta.
    theta = 0.5

    # Specify the likelihood distribution while also conditioning on the observed data.
    y_obs = pm.Binomial("y_obs", n=n, p=theta, observed=Y)

    # Sample from the posterior distribution
    idata = pm.sample(1000, return_inferencedata=True)

I expect the above code to estimate the posterior distribution of n. Instead, I get this error:


SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'n': array(4)}

Initial evaluation results:
n       -1.66
y_obs    -inf
Name: Log-probability of test_point, dtype: float64

I am feeling a hint that -1.66 customers is impossible (-inf likelihood), but I don’t see why that should even be happening since the prior distribution on n is Pois(4.5).

PyMC uses the first moment of your prior distributions as the default test point. Since n has a prior mean of 4.5, the test point will be floor(lambda) = 4. The model is testing the (log-)probability of observing 5 successes out of 4 trials, which is 0, and log(0) = -inf.

You can fix this by manually specifying an initial value for n. I think back in pymc3 it was called testval, so you can do pm.Poisson('n', mu=4.5, testval=10) (or any testval you like, as long as its bigger than 5).

You should update to the latest version of pymc if possible though!

1 Like

That got it, thank you very much!

Also, wow, did not know PyMC5 existed. My only connections to PyMC are the BMCP book (Welcome — Bayesian Modeling and Computation in Python) and the Learn Bayes Stats Podcast (https://learnbayesstats.com/).

To work through the book, I am using the given environment.yml, which uses PyMC3. I’m only on Chapter 1, but it seems like using the provided environment may be better for working through the book then venturing off on my own using the latest PyMC5.

Also, I am binging the podcast from the beginning, and am only at June 2022 so far. By that time, they were talking about PyMC4, not 5, so interested to learn how 5 came along just one year later.