Do I have to use pm.Bernoulli or pm.Binomial?

No matter how hard I look, I don’t see any difference between these two models, the programs of which are reproduced below:

import numpy as np
import pymc as pm
import arviz as az

observed_data = np.array([1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0,
1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0,
1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0,
1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0,
1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 1, 0, 1, 1, 0, 1])

First:

“”"
Bernoulli model
“”"
with pm.Model() as coin_flip_Bern:

prior_p = pm.Beta("p", alpha = 1, beta = 1)

obs_data = pm.MutableData("observed", observed_data)
likelihood = pm.Bernoulli("likelihood", 
                          p = prior_p, 
                          observed = obs_data)

idata_Bern = pm.sample()

Second:

“”"
Binomial model
“”"
with pm.Model() as coin_flip_Bin:

prior_p = pm.Beta("p", alpha = 1, beta = 1)

obs_data = pm.MutableData("observed", observed_data)
likelihood = pm.Binomial("likelihood", 
                         n = 73,
                         p = prior_p, 
                         observed = 46)

idata_Bin = pm.sample()

Do you know of any documents that explain why you should choose pm.Bernoulli, or on the contrary choose pm.Binomial, depending on the situation? For me, the web page PyMC is totally unclear…

Bernoulli shows up in practice when the probability of success varies from trial to trial. For instance in a logistic regression model.

When the probability is the same for each trial, it’s equivalent to a Binomial with n=events. In this case the Binomial is probably more efficient to compute.

Hi Ricardo,

If I understand well, in the case mentioned above, one could simply “crash” the observed_data array in a couple (n, observed) - where n is the number of trials in the experiment, and observed is the number of successes - and then use a Binomial model.

True?

Now, doing so, the line of code obs_data = ... observed_data) becomes obviously completely superfluous, as the array of observed data only intervenes by its “compacted” values n=73 and observed=46

Ok, but could you develop a bit on your sentence “when the probability of success varies from trial to trial”?

Suppose, as an example, I had many experiments with different n and different observed, say: n = [73, 100, 25, 125, 98] and observed = [46, 80, 5, 38, 13].

What would be your suggestion to cope with these data? Should we have to add all n in a n_{big}=73+100+25+125+98 and all observed in a observed_{big}=46+80+5+38+13, and then use a “classical” Binomial model? Or is there a more refined method of dealing with this problem within PyMC modeling?

I mean when it varies between each n=1 trial. That’s when you would use a Bernoulli.

pm.Bernoulli("llike", p=[0.1, 0.9, ..., 0.7], observed=[0, 1, ..., 0])

Each p maps to a single observed failure or success.

You could use Binomial("llike", n=1, p=[0.1, 0.9, ..., 0.7], observed=[0, 1, ..., 0])) (see below), but this would be less efficient than the Bernoulli.

In that case you can pass a vector of n and observed to create a “batched distribution or likelihood”.

pm.Binomial("llike", n=[73, 100, ... 98], p=..., observed=[46, 80, ..., 13])

Where p can be a single number or a vector as long as n or observed. Each n maps to each observed.

This guide might be useful to understand batched distributions: Distribution Dimensionality — PyMC 5.10.0 documentation


About which one to use: Bernoulli when you are modelling the probability of each n=1 trial, Binomial when a n>1 trials share the same p.

Finally, you can represent multiple Binomials compactly by working with batch dimensions.

2 Likes

Thank you Ricardo, that clears up some embarrassing ambiguities on the subject. I’m going to delve into the “Distribution Dimensionality” doc you mention.

By the way, where did you find the argument “llike”? I can’t get my hands on it in the PyMC documentation…

#########
Sorry, I just realised… just a shortcut for “likelihood”…

1 Like

It’s actually just the name he gave to the random variable, like beta = pm.Normal('beta', mu=0, sigma=1). You could write anything there, I guess he wrote “llike” to emphasize that this is the term in the model that will have observed data, and so contributes the p(data | theta) – the likelihood – to Bayes’ rule.

1 Like

Understood!