What function generates random values when an RV is sampled?

bridgeland · November 17, 2020, 1:54pm

This rookie is investigating an apparent bug in the binomial distribution, adding print statements to Binomial.random() to understand how the binomial variables are sampled in this simple model:

import pymc3 as pm
import numpy as np

with pm.Model() as model:
    scalar_binomial = pm.Binomial('scalar_binomial', n=1, p=0.25)
    arrayed_binomial = pm.Binomial('arrayed_binomial', n=1, p=0.25, shape=10)
    arrayed_binomial_alt = pm.Binomial(
        'arrayed_binomial_alt', n=np.ones(10), 
        p=np.full(10, 0.25), shape=10)
    
    trace = pm.sample(draws=600, chains=2, tune=500)

Much to my surprise, this simple model never makes a call to Binomial.random(); nothing is ever printed. Adding print statements to Binomail.logp() reveals that calls are in fact made to that function (as expected), but only during model definition, not during the call to pm.sample():

How are random samples for the binomials generated, if not via Binomial.random() or Binomial.logp()?

I doubt if this question depends on the version of PyMC3, as it appears to be something basic I am not understanding. But just in case, I am running commit bf734fc2, the most recent as of today.

ckrapu · November 17, 2020, 7:10pm

The samples are generated via Metropolis-Hastings proposals which don’t make use of the functionality for sampling directly from the binomial. The probability of accepting the proposal does indeed depend upon the logp method as you mentioned. I believe that the logp is only called when the computation graph is being built. After that, it’s handled in Theano.

You may wonder why not tell the model to use random - the issue is that for all but the most trivial cases (i.e. sampling from a very simple prior), this method isn’t an option. Suppose that we added a node to this model along the lines of x = pm.Normal('x', mu=scalar_binomial, sd=1.0. observed=0.75). This is leads to a coherent probability model for sure, but using .random here would lead to the wrong results as it wouldn’t take the likelihood (dependent on x) into account.

TL;DR: Sampling directly makes sense, but it’s not a solution that works in all cases. Using the Metropolis algorithm to sample from a known distribution is slower, but it works just as well as it does for densities with unknown normalizing constants.

bridgeland · November 17, 2020, 7:47pm

Got it. Thanks.

Topic		Replies	Views
A bug with Binomial on non-scalar arguments? Questions	8	777	November 20, 2020
Binomial Regression with Lognormal Noise v3 modeling	6	512	January 26, 2023
What does the pymc3.Binomial function do when the parameter n receives a list of values? Questions modeling	2	724	January 26, 2022
Random variable as observation Questions	9	1427	January 20, 2023
Meaning of model.logp, a beginner question Questions	4	947	July 12, 2019

What function generates random values when an RV is sampled?

Related topics