I’m following the Python version of Statistical Rethinking Chapter 13, as found here.
I’m running Section 13.31 and it’s taking ages (model has been building for 12 hours and it’s at 20%). Here is the code so that you don’t have to jump around in the notebook:
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from scipy import stats
from scipy.special import expit as logistic
RANDOM_SEED = 8927
np.random.seed(RANDOM_SEED)
d = pd.read_csv("Data/chimpanzees.csv", sep=";")
treatment = (d.prosoc_left + 2 * d.condition).values
Ntreatments = len(np.unique(treatment))
actor = (d.actor - 1).astype(int).values
Nactor = len(np.unique(actor))
block = (d.block - 1).astype(int).values
Nblock = len(np.unique(block))
with pm.Model() as m_13_4:
a_bar = pm.Normal("a_bar", 0.0, 1.5)
sigma_a = pm.Exponential("sigma_a", 1.0)
sigma_g = pm.Exponential("sigma_g", 1.0)
a = pm.Normal("a", a_bar, sigma_a, shape=Nactor)
g = pm.Normal("g", 0.0, sigma_g, shape=Nblock)
b = pm.Normal("b", 0.0, 0.5, shape=Ntreatments)
actor_ = pm.Data("actor", actor)
block_ = pm.Data("block", block)
treatment_ = pm.Data("treatment", treatment)
p = pm.Deterministic("p", pm.math.invlogit(a[actor_] + g[block_] + b[treatment_]))
pulled_left = pm.Binomial("pulled_left", 1, p, observed=d.pulled_left)
trace_13_4 = pm.sample(tune=3000, target_accept=0.99, random_seed=RANDOM_SEED)
Hi @matsuo_basho
I’m guess you mean ‘sample’ rather than build? A few quick observations:
- you are importing PyMC 3, which is quite old now. The latest is version 5. So you could try updating your version.
- I would recommend starting with the default
target_accept
parameter. Ie, try just removing this argument as it’s currently set very high.
- Exponential priors on standard deviations might not be suitable. If you end up with zero standard deviations then this won’t work very well. So you could try
HalfNormal
or Gamma
distributions.
1 Like
I will try the first 2 suggestions. Regarding point 3, I’m running what’s in the notebook and exponential priors on standard deviations worked just fine before all throughout different chapters of this book (including in the same notebook), so will not change that.
Here’s an example from the beginning of the same notebook that worked fine before.
d = pd.read_csv("Data/reedfrogs.csv", sep=",")
with pm.Model() as m_13_2:
a_bar = pm.Normal("a_bar", 0.0, 1.5)
sigma = pm.Exponential("sigma", 1.0)
a = pm.Normal("a", a_bar, sigma, shape=n_tanks)
p = pm.math.invlogit(a[tank])
S = pm.Binomial("S", n=d.density, p=p, observed=d.surv)
trace_13_2 = pm.sample(random_seed=RANDOM_SEED)
I agree with @drbenvincent here. All you code works for me on PyMC 5. When setting target_accept
to 0.99
, sampling takes almost a minute. Removing that argument (target_accept
defaults to 0.8
), the same model samples in 8 seconds. I recommend upgrading and, in general, starting with a call to pm.sample()
using all the defaults. There are lots of good reasons to alter those defaults, but typically in order to address something you are seeing in the sampling you have already performed.
1 Like