Think Bayes example cannot sample from observed data

mattiadg · August 27, 2024, 9:49am

Hi everyone,
I am following the great book Think Bayes and I got to the chapter introducing pymc for MCMC. This code is the last example in the chapter:

with pm.Model() as model9:
    p0 = pm.Beta('p0', alpha=1, beta=1)
    p1 = pm.Beta('p1', alpha=1, beta=1)
    N = pm.DiscreteUniform('N', num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    k00 = N - num_seen
    data = pm.math.stack((k00, k01, k10, k11))
    y = pm.Multinomial('y', n=N, p=ps, observed=data)

and according to the book, it should produce the following warning:

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 2 seconds.
/home/downey/miniconda3/envs/ThinkBayes2/lib/python3.10/site-packages/pymc/backends/arviz.py:65: UserWarning: Could not extract data from symbolic observation y
  warnings.warn(f"Could not extract data from symbolic observation {obs}")
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details

but in many runs I always get the following

/usr/local/lib/python3.10/dist-packages/pymc/backends/arviz.py:60: UserWarning: Could not extract data from symbolic observation y
  warnings.warn(f"Could not extract data from symbolic observation {obs}")

and the distribution forms are quite different from the expected.

Link to the chapter: MCMC — Think Bayes
Package versions:
pymc: 5.10.4
arviz: 0.18.0
pytensor: 2.18.6

Genesis · August 28, 2024, 12:24pm

What are num_seen, k01, k10 and k11? Could it be that you are failing to define them?

mattiadg · August 28, 2024, 2:50pm

They are integers:

k10 = 20 - 3
k01 = 15 - 3
k11 = 3
num_seen = k01 + k10 + k11

mattiadg · August 28, 2024, 3:06pm

This is the full code:

import pymc as pm


k10 = 20 - 3
k01 = 15 - 3
k11 = 3
num_seen = k01 + k10 + k11

with pm.Model() as model9:
    p0 = pm.Beta('p0', alpha=1, beta=1)
    p1 = pm.Beta('p1', alpha=1, beta=1)
    N = pm.DiscreteUniform('N', num_seen, 350)
    
    q0 = 1-p0
    q1 = 1-p1
    ps = [q0*q1, q0*p1, p0*q1, p0*p1]
    
    k00 = N - num_seen
    data = pm.math.stack((k00, k01, k10, k11))
    y = pm.Multinomial('y', n=N, p=ps, observed=data)
    idata9 = pm.sample(1000, **options)
    az.plot_posterior(idata9)

Genesis · August 29, 2024, 10:03am

I think it’s probably because you are setting a random variable as observations. To my knowledge that’s not supported (Random variable as observation)

mattiadg · August 29, 2024, 10:56am

So you think that the example has never worked?
I thought that Think Bayes is a popular book, it is strange that nobody noticed this before.

Genesis · August 29, 2024, 11:06am

I am not sure to be honest. I am not familiar with the book (though have indeed heard of it) or with the history of PyMC - maybe the syntax at the time was different enough to support it, I am really not sure. Maybe one of the devs or other readers of the book can jump in.

If you use the following for N it does seem to no longer emit the warning, though I can’t speak to correctness:

N = pm.draw(pm.DiscreteUniform.dist(num_seen, 350))

mattiadg · August 29, 2024, 11:27am

In the book there are the outputs for the commands, so he’s likely executed them.
Your variant does not produce any warnings, but it does not really solve the problem because one of the goals of the exercise is to compute the posterior for N. It is not computed this way.

Topic		Replies	Views
Sample method of pymc3 Questions	4	687	July 15, 2018
Problem model definition PyMC3 Questions	12	4761	September 21, 2017
Shape error when sampling more than one value Questions	13	1923	December 20, 2018
Observed data in Bayesian networks Questions	13	2783	July 6, 2021
More warnings needed for extra dimensions in observed Development	6	884	November 16, 2017

Think Bayes example cannot sample from observed data

Related topics