Hierarchical model from Statistical Rethinking takes a long time to build

I’m following the Python version of Statistical Rethinking Chapter 13, as found here.

I’m running Section 13.31 and it’s taking ages (model has been building for 12 hours and it’s at 20%). Here is the code so that you don’t have to jump around in the notebook:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm

from scipy import stats
from scipy.special import expit as logistic

RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)

d = pd.read_csv("Data/chimpanzees.csv", sep=";")

treatment = (d.prosoc_left + 2 * d.condition).values
Ntreatments = len(np.unique(treatment))

actor = (d.actor - 1).astype(int).values
Nactor = len(np.unique(actor))

block = (d.block - 1).astype(int).values
Nblock = len(np.unique(block))

with pm.Model() as m_13_4:
    a_bar = pm.Normal("a_bar", 0.0, 1.5)
    sigma_a = pm.Exponential("sigma_a", 1.0)
    sigma_g = pm.Exponential("sigma_g", 1.0)

    a = pm.Normal("a", a_bar, sigma_a, shape=Nactor)
    g = pm.Normal("g", 0.0, sigma_g, shape=Nblock)
    b = pm.Normal("b", 0.0, 0.5, shape=Ntreatments)

    actor_ = pm.Data("actor", actor)
    block_ = pm.Data("block", block)
    treatment_ = pm.Data("treatment", treatment)
    p = pm.Deterministic("p", pm.math.invlogit(a[actor_] + g[block_] + b[treatment_]))
    pulled_left = pm.Binomial("pulled_left", 1, p, observed=d.pulled_left)

    trace_13_4 = pm.sample(tune=3000, target_accept=0.99, random_seed=RANDOM_SEED)

Hi @matsuo_basho
I’m guess you mean ‘sample’ rather than build? A few quick observations:

  • you are importing PyMC 3, which is quite old now. The latest is version 5. So you could try updating your version.
  • I would recommend starting with the default target_accept parameter. Ie, try just removing this argument as it’s currently set very high.
  • Exponential priors on standard deviations might not be suitable. If you end up with zero standard deviations then this won’t work very well. So you could try HalfNormal or Gamma distributions.
1 Like

I will try the first 2 suggestions. Regarding point 3, I’m running what’s in the notebook and exponential priors on standard deviations worked just fine before all throughout different chapters of this book (including in the same notebook), so will not change that.

Here’s an example from the beginning of the same notebook that worked fine before.

d = pd.read_csv("Data/reedfrogs.csv", sep=",")

with pm.Model() as m_13_2:
    a_bar = pm.Normal("a_bar", 0.0, 1.5)
    sigma = pm.Exponential("sigma", 1.0)

    a = pm.Normal("a", a_bar, sigma, shape=n_tanks)
    p = pm.math.invlogit(a[tank])

    S = pm.Binomial("S", n=d.density, p=p, observed=d.surv)
    trace_13_2 = pm.sample(random_seed=RANDOM_SEED)

I agree with @drbenvincent here. All you code works for me on PyMC 5. When setting target_accept to 0.99, sampling takes almost a minute. Removing that argument (target_accept defaults to 0.8), the same model samples in 8 seconds. I recommend upgrading and, in general, starting with a call to pm.sample() using all the defaults. There are lots of good reasons to alter those defaults, but typically in order to address something you are seeing in the sampling you have already performed.

1 Like