My first (hierarchical) PyMC model is very slow

I tried to create my first PyMC model yesterday, but it seems to be very slow and maybe there are also some convergence issues. Basically, I want to implement a hierarchical Bayesian model that originally looks like this:

\boldsymbol{y}|(\boldsymbol{\mu,} \boldsymbol{\Sigma}) \sim N(\boldsymbol{\mu}, \boldsymbol{\Sigma})
\boldsymbol{\mu}|(\mu_0,\boldsymbol{\Sigma}) \sim N(\mu_0\boldsymbol{1}, \boldsymbol{\Sigma} / \kappa_0)
\boldsymbol{\Sigma} \sim \text{Inv-Wishart}_{\nu_0}(\tau_0^{-2}\boldsymbol{P_0}^{-1})
\mu_0 \sim \text{Uniform}(-\infty, \infty)
\tau_0^2 \sim \text{Inv-Gamma}(\epsilon_1, \epsilon_2)

where \epsilon_1, \epsilon_2, \kappa_0, P_0 and \nu_0 are known.

In order to implement it in PyMC, I changed from Inv-Wishart to Cholesky decomposition and also used a Normal distribution instead of Uniform(-\infty,\infty). This is my model:

with pm.Model() as model:
    # Prior for unknown scale parameter
    tau_sq = pm.InverseGamma('tau_sq', alpha=epsilon_1, beta=epsilon_2)

    # Cholesky decomposed covariance matrix using LKJCholeskyCov
    chol, corr, stds = pm.LKJCholeskyCov('chol_cov', n=n_assets, eta=2.0, sd_dist=pm.Exponential.dist(1.0))

    # Scale the covariance matrix by tau_sq
    L = chol * pt.sqrt(tau_sq)  # Scaling factor applied here
    Sigma = pm.Deterministic('Sigma', pt.dot(L, pt.transpose(L)))

    # Prior for the hierarchical mean
    mu_0 = pm.Normal('mu_0', mu=0, sigma=10) 

    # Hierarchical prior for mu
    mu = pm.MvNormal('mu', mu=pt.ones(n_assets) * mu_0, cov=Sigma / kappa_0, shape=n_assets)

    # Likelihood
    obs = pm.MvNormal('obs', mu=mu, cov=Sigma, observed=returns_np)

    # Sampling from the posterior
    trace = pm.sample()

returns_np contains 200 x 50 stock returns, i.e. 50 stocks, and 200 observations per stock. After maybe 30 minutes it has not finished, and initially I get a lot of messages like this:
DEBUG:pymc.stats.convergence:Energy change in leapfrog step is too large: 658625.0689315796.

What could be the issue with my model/implementation? Or is this expected?

In the context of a pure HMC: A large energy change during the MD step (I guess it’s not checked after each leapfrog step, probably only at the end of the trajectory) corresponds to a low acceptance in the Metropolis step. With such a difference your configuration will always be rejected, your sampling will be stuck.

But PyMC doesn’t really use a simple HMC, it’s NUTS. Then I do not know whether this error message means that you are reaching tree_depth and NUTS did not manage to find a decent depth to reach the target acceptance, or that having such a large number is weird even in the first NUTS steps (which cannot compute the exponential of such a number) but the program ignores that and keeps building the tree. Anyway, I would try playing with the nuts:max_treedepth argument of sample(). But increasing this is likely to make your sampling even slower.

I also remember models based on LKJ had some issues with multi-threading, especially if you do not use the latest version. Using chains=1 or sampling with numpyro might help.

1 Like

Before playing with the sampler settings, I’d interrogate your model a bit. I’m a bit leery of all the scaling you’re doing to the covariance matrix. Is this necessary? Do you have the same sampling issues if you take out tau_sq and kappa_0?

It also seems very strange to me to use the same covariance matrix for the expected returns mu and the innovations (residuals?). I guess if you asked me to guess, I’d say one or the other should have a diagonal covariance. Either stock returns are correlated because their expected returns are inherently correlated (in this case mu is MvNormal) or because there are commonalities to the exogenous shocks that drive prices (in this case, obs in MvNormal). Maybe it’s both, but why would you expect the correlation structure in both of these to be the same?

My standing advice for all models is to start simple and build up. Figure out the simplest possible model you can extract from what you have, then fit, score, and save it. Then add one feature and repeat until you hit problems. This will identify the components of the model that are causing problems, and at the same time give you some sense of the marginal returns on complexity to your model.

3 Likes

Thanks, I will consider this. Also, I have realized that it could maybe be possible to derive a stochastic representation of the model, which would mean that I don’t have to use MCMC. I will keep you updated.

Thanks, I agree with you that it’s always a good idea to start with a simple model and also to question the model setup. However, this particular model was suggested by Greyserman et al 2006 in their paper “Portfolio selection using hierarchical Bayesian analysis and MCMC methods”, and I just wanted to implement it for comparison. Hence I don’t want to change it too much.

Looking very quickly, it seems like those authors may well be using a Gibbs sampler (though they don’t really seem to discuss where exactly their results come from), which is not totally unexpected given that it was published almost 2 decades ago. If this is the case, you should not necessarily expect that the exact parameterization they describe to work work well with other (faster, more modern) sampling algorithms (e.g., HMC/NUTS).

1 Like