NUTS sampler slowing down on a hierarchical model

I am still new to PyMC3 and statistics, so I am looking for a critique of my hierarchical model that will hopefully improve its sampling speed. Right now, it starts out fast but slows down dramatically <1 sample/second after about 400 samples.

The data set consists of about 2000 timeseries (these are product revenues), each about 30 time periods in length. A lot of the data is missing (right-censored). I model the timeseries as a piecewise function (logistic growth, followed by exponential decay) and use PyMC3 to infer the parameters of that function for each individual timeseries, as well as its hyperparameters for the whole dataset.

Hopefully, it makes some sense with the code:

y = CF # revenue data
t = np.arange(y.shape[1]) # time index
n_products = y.shape[0] # number of products to be modeled
s = (n_products, 1) # Dimensionality of prior distributions

with pm.Model() as truncated_like_model:
    # Logistic growth hyperpriors
    Smax_sd = pm.HalfCauchy('Smax_sd', beta = 1000)
    t0_mu = pm.Normal('t0_mu', mu = -.5)
    t0_sd = pm.HalfCauchy('t0_sd', beta = 1)
    Kon_sd = pm.HalfCauchy('Kon_sd', beta = 1)
    # Logistic growth priors
    Smax = pm.HalfNormal('Smax', sd = Smax_sd, shape = s)
    t0 = pm.Normal('t0', mu = t0_mu, sd = t0_sd, shape = s)
    Kon = pm.HalfNormal('Kon', sd = Kon_sd, shape = s)
    # Exponential decay hyperpriors
    t1_sd = pm.HalfCauchy('t1_sd', beta = 2)
    Koff_sd = pm.HalfCauchy('Koff_sd', beta = 1)    
    # Exponential decay priors
    t1 = pm.HalfNormal('t1', sd = t1_sd, shape = s)
    Koff = pm.HalfNormal('Koff', sd = Koff_sd, shape = s)
    # Piecewise sales function
    growth = Smax / (1 + pm.math.exp(-Kon*(t-t0)))
    decay = Smax * pm.math.exp(-Koff*(t-t1))
    # Soft switch
    weight = pm.math.sigmoid(2 * (t - t1))

    # Likelihood
    mu = pm.math.abs_(growth + weight * (decay - growth))
    sigma = pm.HalfCauchy('sigma', beta=100) #* pm.math.sqrt(pm.math.sqrt(t+1.))
    likelihood = pm.TruncatedNormal('sales', mu = mu, sd = sigma, lower = 0, upper = y.max().max(), observed = y)

Any advice is greately appreciated!

This prior specification looks a little weird in the sense that it’s so diffuse:

Otherwise, I don’t see anything grossly out of place. ~1 sample a second doesn’t sound too bad for a nonlinear model over 60k data points. Have you considered trying to run it on a GPU?

Thanks! Appreciate the reply. Yes, I was thinking I could try fitting the model on a smaller subset first and then use the results to set more informed priors, perhaps using prior predictive checks. Don’t know if that is the recommended practice or not, any thoughts?

It’s not impossible that having more focused priors helps, but usually that’s the most helpful when you have divergences because the log posterior density is too flat and there’s not enough information from the data + prior.

1 Like

Thanks, Chris. I appreciate it. The sampler speed greatly improved when I narrowed the prior distribution for the sigma parameter.

I realized that my parametric function does not fit some of the data though. So, I am thinking of going to a more general probabilistic model, such as the GaussianRandomWalk, or the EulerMurayama SDE method. I am reading up on both methods to get a better sense of the underlying mechanics. Any advice on which approach might be a better fit here?

The canonical way to put a prior on functions is with a Gaussian process. It’s essentially a continuous-time limit of the GaussianRandomWalk and the PyMC3 API for it is really nice. I haven’t used the SDE functionality yet, so no comment on that.

Quick update: I tried fitting a GRW model on a single timeseries and I seem to get a very good fit upon visual inspection, but I keep getting some divergences during sampling.

Any thoughts on why I am getting divergences with GRW? Here’s the code:

with pm.Model() as model:
    sigma = pm.HalfNormal('sigma', 300)
    alpha = pm.Uniform('alpha', 0, 1)
    mu = pm.GaussianRandomWalk('mu', 
                              sigma=sigma * (1. - alpha), 
    likelihood = pm.Normal('sales', 
                      sigma=sigma * alpha, 

Also, I am not sure how to scale this model to the entire dataset? Should I also try to build a hierarchical model here or use the Multivariate GRW class?

Btw, the SDE model is blazing fast with no divergences but, interestingly, the posterior dist samples look a lot noisier compared to GRW and the fit is not visually as good.

Thank you.