Beta-Binomial conjugate prior -- pm.Binomial buggy results...?

I’m running into what I think might be a bug with PyMC3’s Binomial likelihood. Here’s the example.

1. Simulate data
itrs = 100
p = 0.4
W = []
L = []
N = []

for i in range(itrs):
n = int(np.random.normal(loc=100, scale=20))
wins = np.random.binomial(n=n,p=p)
losses = n - wins
N.append(n)
W.append(wins)
L.append(losses)

1. Define a model
with pm.Model() as m1:
alpha = pm.HalfNormal('alpha',sigma=1)
beta = pm.HalfNormal('beta',sigma=1)
theta = pm.Beta('theta',alpha, beta)
lik = pm.Binomial('likelihood',n=N,p=theta,observed=W)
trace1 = pm.sample(chains=4)

1. Examine parameter estimates
pm.traceplot(trace1, compact=False)


Note: Look at the distribution of Theta.
4. Use mean inferred alpha and beta parameters to visualize P(\theta|X).

def sample_post_plot(samples=100, trace_obj=trace1):
X = np.linspace(0,1,1000)
for i in range(samples):
idx = random.randint(0,len(trace_obj.get_values('alpha')))
a = trace_obj.get_values('alpha')[idx]
b = trace_obj.get_values('beta')[idx]
Y = stats.beta(a=a,b=b).pdf(X)
plt.plot(X,Y)

sample_post_plot(trace_obj=trace1)


Note: This is a completely different distribution than the theta posterior plot from step 3.
4. Reference expectation E[P(\theta|E[\alpha], E[\beta])]

np.average(np.random.beta(a=az.summary(trace1).to_dict()['mean']['alpha'],
b=az.summary(trace1).to_dict()['mean']['beta'], size=1000))
>>>
0.4701658349492111


Note: So both the shape and expectation of P(\theta|\alpha, \beta) differ from PyMC3’s posterior plots. What’s perplexing to me is that E[P(\theta)]=0.403 is extremely close to the theta value we provided in step 1 (0.4). So posterior inference on theta is correct; it’s just that posterior inference on alpha and beta, is completely off.

What’s going on here? I’m hesitant to think this is truly a PyMC3 bug; probably more likely operator error.

Edit: I’ve discovered that this {alpha, beta, theta} disagreement, for lack of a better term, goes away when I use a Beta-Binomial likelihood instead. This post os already quite long, so for the sake of brevity, I’ve linked a Google Colab with all relevant code.

Nonetheless, I’m still really curious-- what is going on here?

I might be glancing over the obvious issue, but I don’t think you have enough information to constrain your hyperparameters alpha and beta. You have one single theta with which they have to agree, so I would expect their posteriors to not be that much different from the priors.

Similarly your theta posterior would be much more uncertain if you had a single observation in the binomial (that is n=1).

Remember that the hyperpriors are conditionally independent from the likelihood, once we consider theta.

So, there are 100 [N,L,W] sets (N trials, L losses, W wins). So pm.Binomial is receiving the N and W arrays, each of length=100. When I swapped out the Binomial likelihood for Beta-Binomial likelihood, the issue went away. What I can’t figure out is… why?

For a more extensive conversation, see my Cross Validated Stack Exchange post.

The beta binomial corresponds to a model where each of those binomial arrays is given it’s own theta prior. So your theta should have shape 100 for those to be equivalent models.

Just to be clear, you’re referring to pm.BetaBinomial(), right?

That is:

with pm.Model() as m1:
alpha = pm.HalfNormal('alpha',sigma=1)
beta = pm.HalfNormal('beta',sigma=1)
theta = pm.Beta('theta',alpha, beta, shape=len(W))
lik = pm.Binomial('likelihood',n=N,p=theta,observed=W)
trace1 = pm.sample(chains=4)

1 Like

Yes

So, your code used pm.Binomial() not pm.BetaBinomial(). I ran it as listed and it worked perfectly! Was this just a typo on this comment?

I am not sure where did I make a typo. A beta-binomial distribution or model refers to this kind of model where each observation is modelled with a binomial likelihood whose p parameter is in turn given a unique beta prior that has shared alpha/beta hyper priors.

This:

with pm.Model() as m1:
alpha = pm.HalfNormal('alpha',sigma=1)
beta = pm.HalfNormal('beta',sigma=1)
theta = pm.Beta('theta',alpha, beta, shape=len(W))
lik = pm.Binomial('likelihood',n=N,p=theta,observed=W)


And this are therefore equivalent:

with pm.Model() as m1:
alpha = pm.HalfNormal('alpha',sigma=1)
beta = pm.HalfNormal('beta',sigma=1)
lik = pm.BetaBinomial('likelihood',n=N,alpha=alpha, beta=beta,observed=W)


Except theta is marginalized out in the later

2 Likes

I get it now! Thanks!