SamplingError: Initial evaluation of model at starting point failed

I’m trying to rewrite student cheating example in the book “Bayesian methods for hackers” under pyMC v5.0.1.

import pymc as pm
import arviz as az
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np


def cheat():
    print(f"Running on PyMC v{pm.__version__}")
    model = pm.Model()

    with model:
        N = 10
        p = pm.Uniform("cheat_rate", lower=0.0, upper=1.0, initval=0.1)
        cheat_truth = pm.Bernoulli("cheat_truth", p=p, size=N)
        flip1 = pm.Bernoulli("flip1", p=0.5, size=N)
        flip2 = pm.Bernoulli("flip2", p=0.5, size=N)
        yes_rate = pm.Deterministic("yes_rate", (flip1 * cheat_truth + (1 - flip1) * flip2).sum() / N)
        yes_count_observed = 35
        yes_count = pm.Binomial("yes_count", n=N, p=yes_rate, observed=yes_count_observed)
        print('sample of yes_count:', pm.draw(yes_count))
        pm.model_to_graphviz(model=model)
    
    with model:
        idata = pm.sample(draws=1000, discard_tuned_samples=False)
        az.plot_trace(idata, combined=True)
        print(az.summary(idata, round_to=2)) 

However, it always blames below even I give the initval as starting point. Another problem is that the model graph does not show in JupyterLab even graph.save('a.png') gives empty picture. Could you help me to solve those problems? Thank you in advance.

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'cheat_rate_interval__': array(-2.19722458), 'cheat_truth': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'flip1': array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), 'flip2': array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])}

Initial evaluation results:
{'cheat_rate': -2.41, 'cheat_truth': -1.05, 'flip1': -6.93, 'flip2': -6.93, 'yes_count': -inf}

It looks like you are modeling data that consists of N=10 flips that resulted in yes_count_observed = 35 “heads”. I’m not sure if the data was coded incorrectly or what.

In addition, there seems to a high probability that yes_rate will equal zero, which will make any observations yes_count_observed>0 impossible under your model. Something similar will happen if yes_rate=1 (data in which yes_count_observed<N is impossible).

1 Like

Thank you for the reply. Actually I have found N=10 is too low. However, even I set N=100, it is still failed. After many tries, I found a meaningful initval for cheat_truth is crucial.

cheat_truth = pm.Bernoulli("cheat_truth", p=p, size=N)  # failed
cheat_truth = pm.Bernoulli("cheat_truth", p=p, size=N, initval=bernoulli.rvs(0.0, size=N))  # failed
cheat_truth = pm.Bernoulli("cheat_truth", p=p, size=N, initval=bernoulli.rvs(1.0, size=N))  # failed
cheat_truth = pm.Bernoulli("cheat_truth", p=p, size=N, initval=bernoulli.rvs(0.5, size=N))  # ok

Could you tell me why?

As I said, you have a very fragile model. You have a bunch of Bernoulli parameters (taking on values of 0 or 1) that are likely being initialized to zero and your data is likely impossible under many of these scenarios. For example, when flip1, cheat_truth, and flip2 are all zero, then yes_rate is

(0 * 0 + (1-0) * 0).sum() / N

and the probability of a “yes” is zero, making any “yes” observations impossible.

I’m new to pyMC and Bayesian method. Maybe I misunderstand some core concepts about MCMC algorithm. In my opinion, even if those Bernoullis are all 0, yes_rate=0 is a ok since MCMC treats this as one bad sampling which makes no change to the shape of posterior distribution. It will simply drop it and try to make next sample rather than stops the whole computation.

First off, there are many MCMC sampling algorithms. But any sampler is going to have a tough time getting started if the data is impossible at the starting point.

In general, it’s much easier to represent probabilities (e.g., p) than it is to represent the outcomes of probabilistic processes (e.g., flip1). The expected value of yes_rate is:

(0.5 * p + (1 - 0.5) * 0.5).sum() / N

This quantity could be plugged into the specification for yes_count directly and you would likely run into fewer issues. So you can just replace the Bernoulli parameters with their (beta-distributed) expectation and have an easier time. Something like this:

    N = 100

    p = pm.Beta("cheat_rate", alpha=1, beta=1)

    # concentration of cheat_truth around p
    kappa_minus2 = pm.Gamma('kappa_minus2', .5, .5)
    kappa = pm.Deterministic('kappa', kappa_minus2 + 2)

    cheat_truth = pm.Beta("cheat_truth", alpha=p*(kappa-2)+1, beta=(1-p)*(kappa-2)+1, shape=N)

    flip1 = pm.Beta("flip1", alpha=1, beta=1, size=N)
    flip2 = pm.Beta("flip2", alpha=1, beta=1, size=N)

    yes_rate = pm.Deterministic("yes_rate", (flip1 * cheat_truth + (1 - flip1) * flip2).sum() / N)
    yes_count_observed = 5
    yes_count = pm.Binomial("yes_count", n=N, p=yes_rate, observed=yes_count_observed)
    idata = pm.sample()
2 Likes