Interpreting beta-binomial parameterisation/posterior

Hello all,

I am just getting started with PyMC3, so apologies if this is a basic question!

I am working with some COVID-19 data, and I have for a sample of countries, the number of cases and the number of deaths. I also have two predictors which are thought to influence the number of deaths. I’ve structured my model as a hierarchical beta-binomial, where for each country, the predictors influence the expected value of a beta distribution which is then fed to the likelihood - a fully reproducible example is below:

import arviz as az
import pandas as pd
import pymc3 as pm
import numpy as np
np.random.seed(32)

# Retrieve data
data = pd.read_csv('https://osf.io/pf9te/download', index_col='country')

# Set up model
with pm.Model() as bebi:

    # Set priors on linear coefficients
    β0 = pm.Normal('intercept', 0, 3)
    β1 = pm.Normal('male attr', 0, 3)
    β2 = pm.Normal('female attr', 0, 3)

    # Prior on variance for beta distribution for each country
    σ = pm.Exponential('Beta Variance', 50)

    # Compute mean of each countries beta distribution
    μ = pm.math.invlogit(β0*data['constant'] + β1*data['male_attr'] + β2*data['female_attr'])

    # Use the above to create a beta prior
    ps = pm.Beta('predicted % death', mu=μ, sigma=σ, shape=data.shape[0])

    # Likelihood is binomial
    y = pm.Binomial('estimated deaths', n=data['cases'].values, p=ps, observed=data['deaths'])

    # Inference button
    trace = pm.sample(10_000, tune=1000, cores=8)
    ppc = pm.sample_posterior_predictive(trace, samples=2000)

This seems to make sense - the model returns a sensible looking ppc, but I am still unsure if I have formulated this correctly, so any advice is appreciated!

My second question is, assuming the above is sensible, is in the interpretation of an aspect of the posterior. For the purposes of a paper, I am interested in the uncertainty of the total number of deaths suggested by the model. I can sum the total number of deaths observed, and then sum across each draw from the ppc, and each time compute the difference from the observed total number of deaths, like so:

# Get number of observed deaths
n_deaths = data['deaths'].sum(0)

# Coerce ppc into a dataframe
pp = pd.DataFrame(ppc['estimated deaths'].T, index=data.index)

# Compute difference-in-deaths per sample
death_diff = pp.sum(0) - n_deaths

# Plot
az.plot_density(death_diff.values)

I think I am confusing myself about what a posterior predictive check is/means, but does this distribution-of-differences represent anything interesting in terms of the models predictors? The mean of the distribution is close to zero, which must mean that most of the time, the number of deaths predicted by the model is very close to the observed. Does the density of the distribution reflect anything about the quality of the predictors - i.e. if the predictors were useful, would we expect to see a distribution-of-differences more bunched up around zero?

I hope that makes sense - I think this is probably something basic in my misunderstanding of the Bayesian machinery so any help is appreciated! Thank you :slight_smile:

Its a bit of an unusual formulation that you have. The only time I would use a beta is if I did not have any covariates, and just wanted a population distribution for p. Otherwise, its not really worth the bother. Since you are already inverse logit-transforming things, you can just create a normal random effect on beta0 and add the covariates to that before transforming. So,

b0_i ~ N(m, s)
p = invlogit(b0 + b1x1 + b2x2)
y = binomial(n, p)

That more naturally breaks things down into random and fixed effects.

As far as posterior predictive checks are concerned, you are sampling from the model and then checking to see whether its plausible that the data you observed could have been generated from that model. It doesn’t tell you about how useful your model is, only whether it is a reasonable fit to the data. I sometimes like to plot the percentiles of the checks, and see whether they are more or less uniform (or at least without spikes near zero or one).

2 Likes

Thank you, Chris! This is really helpful. It did seem quite tricky to build a beta distribution in the way I did, so glad to see there is a simpler and more direct route.

I’ve altered the model now to have a separate, random effect per country, and the inverse logit-transformed parameter is passed to the binomial likelihood as suggested, like so:

# Retrieve data
data = pd.read_csv('https://osf.io/pf9te/download')

countries = pd.Categorical(data['country']).codes
cases = data.loc[countries, 'cases']
deaths = data.loc[countries, 'deaths']

with pm.Model() as bebi:

    # Set priors on linear coefficients
    β0 = pm.Normal('intercept', 0, 3, shape=len(cases))
    β1 = pm.Normal('male attr', 0, .5)
    β2 = pm.Normal('female attr', 0, .5)

    # Compute mean of each countries beta distribution
    μ = pm.math.invlogit(β0[countries] + β1*data['male_attr'] + β2*data['female_attr'] )

    # Likelihood is binomial
    y = pm.Binomial('estimated deaths', n=cases, p=μ, observed=deaths)

    # Inference button
    trace = pm.sample(2000, tune=1000, cores=8)

This seems a fair bit slower to sample from and at the end I get some warnings about maximum tree depth, but the ppc looks just as clean as the original beta-binomial implementation I had. I’m not sure if those are problematic - from what I read it seems like NUTS is just pulling the plug early but the ‘correctness’ of the posterior is unaffected. Any thoughts on this implementation is appreciated and thanks so much again!