Think Bayes example cannot sample from observed data

Hi everyone,
I am following the great book Think Bayes and I got to the chapter introducing pymc for MCMC. This code is the last example in the chapter:

with pm.Model() as model9:
    p0 = pm.Beta('p0', alpha=1, beta=1)
    p1 = pm.Beta('p1', alpha=1, beta=1)
    N = pm.DiscreteUniform('N', num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    k00 = N - num_seen
    data = pm.math.stack((k00, k01, k10, k11))
    y = pm.Multinomial('y', n=N, p=ps, observed=data)

and according to the book, it should produce the following warning:

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
/home/downey/miniconda3/envs/ThinkBayes2/lib/python3.10/site-packages/pymc/backends/arviz.py:65: UserWarning: Could not extract data from symbolic observation y
  warnings.warn(f"Could not extract data from symbolic observation {obs}")
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details

but in many runs I always get the following

/usr/local/lib/python3.10/dist-packages/pymc/backends/arviz.py:60: UserWarning: Could not extract data from symbolic observation y
  warnings.warn(f"Could not extract data from symbolic observation {obs}")

and the distribution forms are quite different from the expected.

Link to the chapter: MCMC — Think Bayes
Package versions:
pymc: 5.10.4
arviz: 0.18.0
pytensor: 2.18.6

What are num_seen, k01, k10 and k11? Could it be that you are failing to define them?

1 Like

They are integers:

k10 = 20 - 3
k01 = 15 - 3
k11 = 3
num_seen = k01 + k10 + k11

This is the full code:

import pymc as pm


k10 = 20 - 3
k01 = 15 - 3
k11 = 3
num_seen = k01 + k10 + k11

with pm.Model() as model9:
    p0 = pm.Beta('p0', alpha=1, beta=1)
    p1 = pm.Beta('p1', alpha=1, beta=1)
    N = pm.DiscreteUniform('N', num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    k00 = N - num_seen
    data = pm.math.stack((k00, k01, k10, k11))
    y = pm.Multinomial('y', n=N, p=ps, observed=data)
    idata9 = pm.sample(1000, **options)
    az.plot_posterior(idata9)

I think it’s probably because you are setting a random variable as observations. To my knowledge that’s not supported (Random variable as observation)

So you think that the example has never worked?
I thought that Think Bayes is a popular book, it is strange that nobody noticed this before.

I am not sure to be honest. I am not familiar with the book (though have indeed heard of it) or with the history of PyMC - maybe the syntax at the time was different enough to support it, I am really not sure. Maybe one of the devs or other readers of the book can jump in.

If you use the following for N it does seem to no longer emit the warning, though I can’t speak to correctness:

N = pm.draw(pm.DiscreteUniform.dist(num_seen, 350))

In the book there are the outputs for the commands, so he’s likely executed them.
Your variant does not produce any warnings, but it does not really solve the problem because one of the goals of the exercise is to compute the posterior for N. It is not computed this way.