Hierarchical model from Statistical Rethinking takes a long time to build

matsuo_basho · December 24, 2022, 3:07pm

I’m following the Python version of Statistical Rethinking Chapter 13, as found here.

I’m running Section 13.31 and it’s taking ages (model has been building for 12 hours and it’s at 20%). Here is the code so that you don’t have to jump around in the notebook:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm

from scipy import stats
from scipy.special import expit as logistic

RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)

d = pd.read_csv("Data/chimpanzees.csv", sep=";")

treatment = (d.prosoc_left + 2 * d.condition).values
Ntreatments = len(np.unique(treatment))

actor = (d.actor - 1).astype(int).values
Nactor = len(np.unique(actor))

block = (d.block - 1).astype(int).values
Nblock = len(np.unique(block))

with pm.Model() as m_13_4:
    a_bar = pm.Normal("a_bar", 0.0, 1.5)
    sigma_a = pm.Exponential("sigma_a", 1.0)
    sigma_g = pm.Exponential("sigma_g", 1.0)

    a = pm.Normal("a", a_bar, sigma_a, shape=Nactor)
    g = pm.Normal("g", 0.0, sigma_g, shape=Nblock)
    b = pm.Normal("b", 0.0, 0.5, shape=Ntreatments)

    actor_ = pm.Data("actor", actor)
    block_ = pm.Data("block", block)
    treatment_ = pm.Data("treatment", treatment)
    p = pm.Deterministic("p", pm.math.invlogit(a[actor_] + g[block_] + b[treatment_]))
    pulled_left = pm.Binomial("pulled_left", 1, p, observed=d.pulled_left)

    trace_13_4 = pm.sample(tune=3000, target_accept=0.99, random_seed=RANDOM_SEED)

drbenvincent · December 24, 2022, 4:47pm

Hi @matsuo_basho
I’m guess you mean ‘sample’ rather than build? A few quick observations:

you are importing PyMC 3, which is quite old now. The latest is version 5. So you could try updating your version.
I would recommend starting with the default target_accept parameter. Ie, try just removing this argument as it’s currently set very high.
Exponential priors on standard deviations might not be suitable. If you end up with zero standard deviations then this won’t work very well. So you could try HalfNormal or Gamma distributions.

matsuo_basho · December 24, 2022, 10:27pm

I will try the first 2 suggestions. Regarding point 3, I’m running what’s in the notebook and exponential priors on standard deviations worked just fine before all throughout different chapters of this book (including in the same notebook), so will not change that.

Here’s an example from the beginning of the same notebook that worked fine before.

d = pd.read_csv("Data/reedfrogs.csv", sep=",")

with pm.Model() as m_13_2:
    a_bar = pm.Normal("a_bar", 0.0, 1.5)
    sigma = pm.Exponential("sigma", 1.0)

    a = pm.Normal("a", a_bar, sigma, shape=n_tanks)
    p = pm.math.invlogit(a[tank])

    S = pm.Binomial("S", n=d.density, p=p, observed=d.surv)
    trace_13_2 = pm.sample(random_seed=RANDOM_SEED)

cluhmann · December 25, 2022, 2:09am

I agree with @drbenvincent here. All you code works for me on PyMC 5. When setting target_accept to 0.99, sampling takes almost a minute. Removing that argument (target_accept defaults to 0.8), the same model samples in 8 seconds. I recommend upgrading and, in general, starting with a call to pm.sample() using all the defaults. There are lots of good reasons to alter those defaults, but typically in order to address something you are seeing in the sampling you have already performed.

Topic		Replies	Views
Hierarchical Model - Slow Sampling ( 1.8 sec per draw ) - Primary suspect is how the multi-level hierarchy is coded Questions	5	835	November 2, 2021
Model taking a long time to run Questions	2	429	October 17, 2020
How to speed up bambi model version agnostic bambi	2	685	April 9, 2023
Hierarchical Model - Slow Sampling Questions	4	1165	March 26, 2020
Any suggestions on speeding up the model? modeling	9	428	April 2, 2024

Hierarchical model from Statistical Rethinking takes a long time to build

Related topics