What is tune in pm.sample(1000, tune=1000).
I understand 1000 samples are taken from the prior to estimate the posterior but what does tune do? Changing it to a low number messes everything up
Hi @Rahul_Deora,
Take a quick peek at @colcarroll’s Series of posts on implementing Hamiltonian Monte Carlo . Post #3 covers this in detail:
One of the most immediate improvements you can make to Hamiltonian Monte Carlo (HMC) is to implement step size adaptation, which gives you fewer parameters to tune, and adds in the concept of “warmup” or “tuning” for your sampler.
Can you explain briefly what that means? I’m not quite getting it. I am doing Statistical Rethinking and the pymc devs have used MCMC in place of quadratic approximation. The book explains MCMC much later ahead.
Yes! A design goal of PyMC3 is to let the user worry about statistical modelling, and not worry about inference, and tuning
attempts to automatically set some of the dozens of knobs available in modern MCMC methods.
As a basic, concrete example, Metropolis-Hastings MCMC starts at a point x
, then draws x'
from Normal(x, sd)
, and does some math to accept or reject x'
: if it rejects x'
, you add x
to your samples again.
So how do we choose sd
for the proposal distribution? There are some papers that suggest Metropolis-Hastings is most efficient when you accept 23.4% of proposed samples, and it turns out that lowering step size increases the probability of accepting a proposal. PyMC3 will spend the first 500 steps increasing and decreasing the step size to try to find the best value of sd
that will give you an acceptance rate of 23.4% (you can even set different acceptance rates).
The problem is that if you change the step size while sampling, you lose the guarantees that your samples (asymptotically) come from the target distribution, so you should typically discard these. Also, there is typically a lot more adaptation going on in those first steps than just step_size.
tl;dr: The first tune
steps allow the PyMC3 developers to adjust parameters based on best practices and current research.
So should tune = no of samples?
Nope! They are two parameters set separately pm.sample(n_samples, tune=n_tune)
.
I think the default of 500 samples and 500 tuning samples is usually good, but more tuning can sometimes help for complicated geometries, and more samples can sometimes help if you are making careful estimates.
Kinda hard to do when it takes forever. My n_samples=10 and n_tune=3 took about an hour and a half to run, on a small dataset of ~3000 rows.
So, what would you suggest here? Or is it just me who is getting these insanely long run times (I’m on a MacBook Air M1)?
Hi! The short answer is that it may be difficult to sample.
There are certain posteriors that might frustrate gradient based samplers. Locally, NUTS will take a long time because it will take up to 1,024 steps, checking for a U-Turn, and will stop expanding whenever it encounters one. In case it is taking this long, It may be taking 1,024 steps on every iteration. This means
- the sampler is not actually encountering a U-turn, meaning the draws will be more correlated than they could be, and
- it takes a long time (1,024 log_prob evaluations, and 1,024 grad(log_prob) evaluations, more or less).
Sampling typically will get faster as tuning goes along (I think currently in pymc
, a new mass matrix is used after 101 tuning draws), but if that takes prohibitively long, you might have to think about
- an alternative strategy to summarize the posterior (optimization, VI, pen and paper),
- changing some priors to more well behaved distributions that are reasonably informative (i.e., make everything normal or half-normal with scales that are like 10, instead of like 1e10),
- changing the model structure to better capture how the data were generated
Sorry this isn’t an easy answer!