My first (hierarchical) PyMC model is very slow

Vierni · September 22, 2023, 6:15am

I tried to create my first PyMC model yesterday, but it seems to be very slow and maybe there are also some convergence issues. Basically, I want to implement a hierarchical Bayesian model that originally looks like this:

\boldsymbol{y}|(\boldsymbol{\mu,} \boldsymbol{\Sigma}) \sim N(\boldsymbol{\mu}, \boldsymbol{\Sigma})

\boldsymbol{\mu}|(\mu_0,\boldsymbol{\Sigma}) \sim N(\mu_0\boldsymbol{1}, \boldsymbol{\Sigma} / \kappa_0)

\boldsymbol{\Sigma} \sim \text{Inv-Wishart}_{\nu_0}(\tau_0^{-2}\boldsymbol{P_0}^{-1})

\mu_0 \sim \text{Uniform}(-\infty, \infty)

\tau_0^2 \sim \text{Inv-Gamma}(\epsilon_1, \epsilon_2)

where \epsilon_1, \epsilon_2, \kappa_0, P_0 and \nu_0 are known.

In order to implement it in PyMC, I changed from Inv-Wishart to Cholesky decomposition and also used a Normal distribution instead of Uniform(-\infty,\infty). This is my model:

with pm.Model() as model:
    # Prior for unknown scale parameter
    tau_sq = pm.InverseGamma('tau_sq', alpha=epsilon_1, beta=epsilon_2)

    # Cholesky decomposed covariance matrix using LKJCholeskyCov
    chol, corr, stds = pm.LKJCholeskyCov('chol_cov', n=n_assets, eta=2.0, sd_dist=pm.Exponential.dist(1.0))

    # Scale the covariance matrix by tau_sq
    L = chol * pt.sqrt(tau_sq)  # Scaling factor applied here
    Sigma = pm.Deterministic('Sigma', pt.dot(L, pt.transpose(L)))

    # Prior for the hierarchical mean
    mu_0 = pm.Normal('mu_0', mu=0, sigma=10) 

    # Hierarchical prior for mu
    mu = pm.MvNormal('mu', mu=pt.ones(n_assets) * mu_0, cov=Sigma / kappa_0, shape=n_assets)

    # Likelihood
    obs = pm.MvNormal('obs', mu=mu, cov=Sigma, observed=returns_np)

    # Sampling from the posterior
    trace = pm.sample()

returns_np contains 200 x 50 stock returns, i.e. 50 stocks, and 200 observations per stock. After maybe 30 minutes it has not finished, and initially I get a lot of messages like this:
DEBUG:pymc.stats.convergence:Energy change in leapfrog step is too large: 658625.0689315796.

What could be the issue with my model/implementation? Or is this expected?

julien · September 22, 2023, 2:53pm

In the context of a pure HMC: A large energy change during the MD step (I guess it’s not checked after each leapfrog step, probably only at the end of the trajectory) corresponds to a low acceptance in the Metropolis step. With such a difference your configuration will always be rejected, your sampling will be stuck.

But PyMC doesn’t really use a simple HMC, it’s NUTS. Then I do not know whether this error message means that you are reaching tree_depth and NUTS did not manage to find a decent depth to reach the target acceptance, or that having such a large number is weird even in the first NUTS steps (which cannot compute the exponential of such a number) but the program ignores that and keeps building the tree. Anyway, I would try playing with the nuts:max_treedepth argument of sample(). But increasing this is likely to make your sampling even slower.

I also remember models based on LKJ had some issues with multi-threading, especially if you do not use the latest version. Using chains=1 or sampling with numpyro might help.

jessegrabowski · September 22, 2023, 3:12pm

Before playing with the sampler settings, I’d interrogate your model a bit. I’m a bit leery of all the scaling you’re doing to the covariance matrix. Is this necessary? Do you have the same sampling issues if you take out tau_sq and kappa_0?

It also seems very strange to me to use the same covariance matrix for the expected returns mu and the innovations (residuals?). I guess if you asked me to guess, I’d say one or the other should have a diagonal covariance. Either stock returns are correlated because their expected returns are inherently correlated (in this case mu is MvNormal) or because there are commonalities to the exogenous shocks that drive prices (in this case, obs in MvNormal). Maybe it’s both, but why would you expect the correlation structure in both of these to be the same?

My standing advice for all models is to start simple and build up. Figure out the simplest possible model you can extract from what you have, then fit, score, and save it. Then add one feature and repeat until you hit problems. This will identify the components of the model that are causing problems, and at the same time give you some sense of the marginal returns on complexity to your model.

Vierni · September 25, 2023, 6:46am

Thanks, I will consider this. Also, I have realized that it could maybe be possible to derive a stochastic representation of the model, which would mean that I don’t have to use MCMC. I will keep you updated.

Vierni · September 25, 2023, 6:51am

Thanks, I agree with you that it’s always a good idea to start with a simple model and also to question the model setup. However, this particular model was suggested by Greyserman et al 2006 in their paper “Portfolio selection using hierarchical Bayesian analysis and MCMC methods”, and I just wanted to implement it for comparison. Hence I don’t want to change it too much.

cluhmann · October 2, 2023, 6:08pm

Looking very quickly, it seems like those authors may well be using a Gibbs sampler (though they don’t really seem to discuss where exactly their results come from), which is not totally unexpected given that it was published almost 2 decades ago. If this is the case, you should not necessarily expect that the exact parameterization they describe to work work well with other (faster, more modern) sampling algorithms (e.g., HMC/NUTS).

Topic		Replies	Views
First attempt at a hierarchical model with custom likelihood Questions	1	700	September 18, 2018
Hierarchical Model - Slow Sampling Questions	4	1170	March 26, 2020
Slow speed of NUTS in Hierarchical model Questions	14	2058	October 20, 2020
NUTS sampler slowing down on a hierarchical model Questions	6	922	February 28, 2021
Advice for speeding up large multivariate models Questions	17	2144	April 12, 2018

My first (hierarchical) PyMC model is very slow

Related topics