What function generates random values when an RV is sampled?

This rookie is investigating an apparent bug in the binomial distribution, adding print statements to Binomial.random() to understand how the binomial variables are sampled in this simple model:

import pymc3 as pm
import numpy as np

with pm.Model() as model:
    scalar_binomial = pm.Binomial('scalar_binomial', n=1, p=0.25)
    arrayed_binomial = pm.Binomial('arrayed_binomial', n=1, p=0.25, shape=10)
    arrayed_binomial_alt = pm.Binomial(
        'arrayed_binomial_alt', n=np.ones(10), 
        p=np.full(10, 0.25), shape=10)
    
    trace = pm.sample(draws=600, chains=2, tune=500) 

Much to my surprise, this simple model never makes a call to Binomial.random(); nothing is ever printed. Adding print statements to Binomail.logp() reveals that calls are in fact made to that function (as expected), but only during model definition, not during the call to pm.sample():

How are random samples for the binomials generated, if not via Binomial.random() or Binomial.logp()?

I doubt if this question depends on the version of PyMC3, as it appears to be something basic I am not understanding. But just in case, I am running commit bf734fc2, the most recent as of today.

The samples are generated via Metropolis-Hastings proposals which don’t make use of the functionality for sampling directly from the binomial. The probability of accepting the proposal does indeed depend upon the logp method as you mentioned. I believe that the logp is only called when the computation graph is being built. After that, it’s handled in Theano.

You may wonder why not tell the model to use random - the issue is that for all but the most trivial cases (i.e. sampling from a very simple prior), this method isn’t an option. Suppose that we added a node to this model along the lines of x = pm.Normal('x', mu=scalar_binomial, sd=1.0. observed=0.75). This is leads to a coherent probability model for sure, but using .random here would lead to the wrong results as it wouldn’t take the likelihood (dependent on x) into account.

TL;DR: Sampling directly makes sense, but it’s not a solution that works in all cases. Using the Metropolis algorithm to sample from a known distribution is slower, but it works just as well as it does for densities with unknown normalizing constants.

2 Likes

Got it. Thanks.