NUTS sampler slowing down on a hierarchical model

sharsenij14 · February 25, 2021, 6:44am

I am still new to PyMC3 and statistics, so I am looking for a critique of my hierarchical model that will hopefully improve its sampling speed. Right now, it starts out fast but slows down dramatically <1 sample/second after about 400 samples.

The data set consists of about 2000 timeseries (these are product revenues), each about 30 time periods in length. A lot of the data is missing (right-censored). I model the timeseries as a piecewise function (logistic growth, followed by exponential decay) and use PyMC3 to infer the parameters of that function for each individual timeseries, as well as its hyperparameters for the whole dataset.

Hopefully, it makes some sense with the code:

y = CF # revenue data
t = np.arange(y.shape[1]) # time index
n_products = y.shape[0] # number of products to be modeled
s = (n_products, 1) # Dimensionality of prior distributions

with pm.Model() as truncated_like_model:
    
    # Logistic growth hyperpriors
    Smax_sd = pm.HalfCauchy('Smax_sd', beta = 1000)
    t0_mu = pm.Normal('t0_mu', mu = -.5)
    t0_sd = pm.HalfCauchy('t0_sd', beta = 1)
    Kon_sd = pm.HalfCauchy('Kon_sd', beta = 1)
    
    # Logistic growth priors
    Smax = pm.HalfNormal('Smax', sd = Smax_sd, shape = s)
    t0 = pm.Normal('t0', mu = t0_mu, sd = t0_sd, shape = s)
    Kon = pm.HalfNormal('Kon', sd = Kon_sd, shape = s)
    
    # Exponential decay hyperpriors
    t1_sd = pm.HalfCauchy('t1_sd', beta = 2)
    Koff_sd = pm.HalfCauchy('Koff_sd', beta = 1)    
    
    # Exponential decay priors
    t1 = pm.HalfNormal('t1', sd = t1_sd, shape = s)
    Koff = pm.HalfNormal('Koff', sd = Koff_sd, shape = s)
    
    # Piecewise sales function
    growth = Smax / (1 + pm.math.exp(-Kon*(t-t0)))
    decay = Smax * pm.math.exp(-Koff*(t-t1))
    
    # Soft switch
    weight = pm.math.sigmoid(2 * (t - t1))

    # Likelihood
    mu = pm.math.abs_(growth + weight * (decay - growth))
    sigma = pm.HalfCauchy('sigma', beta=100) #* pm.math.sqrt(pm.math.sqrt(t+1.))
    likelihood = pm.TruncatedNormal('sales', mu = mu, sd = sigma, lower = 0, upper = y.max().max(), observed = y)

Any advice is greately appreciated!

ckrapu · February 26, 2021, 7:25pm

This prior specification looks a little weird in the sense that it’s so diffuse:

Otherwise, I don’t see anything grossly out of place. ~1 sample a second doesn’t sound too bad for a nonlinear model over 60k data points. Have you considered trying to run it on a GPU?

sharsenij14 · February 26, 2021, 11:23pm

Thanks! Appreciate the reply. Yes, I was thinking I could try fitting the model on a smaller subset first and then use the results to set more informed priors, perhaps using prior predictive checks. Don’t know if that is the recommended practice or not, any thoughts?

ckrapu · February 27, 2021, 12:10am

It’s not impossible that having more focused priors helps, but usually that’s the most helpful when you have divergences because the log posterior density is too flat and there’s not enough information from the data + prior.

sharsenij14 · February 27, 2021, 8:53pm

Thanks, Chris. I appreciate it. The sampler speed greatly improved when I narrowed the prior distribution for the sigma parameter.

I realized that my parametric function does not fit some of the data though. So, I am thinking of going to a more general probabilistic model, such as the GaussianRandomWalk, or the EulerMurayama SDE method. I am reading up on both methods to get a better sense of the underlying mechanics. Any advice on which approach might be a better fit here?

ckrapu · February 27, 2021, 11:37pm

The canonical way to put a prior on functions is with a Gaussian process. It’s essentially a continuous-time limit of the GaussianRandomWalk and the PyMC3 API for it is really nice. I haven’t used the SDE functionality yet, so no comment on that.

sharsenij14 · February 28, 2021, 6:57am

Quick update: I tried fitting a GRW model on a single timeseries and I seem to get a very good fit upon visual inspection, but I keep getting some divergences during sampling.

Any thoughts on why I am getting divergences with GRW? Here’s the code:

with pm.Model() as model:
    
    sigma = pm.HalfNormal('sigma', 300)
    alpha = pm.Uniform('alpha', 0, 1)
    mu = pm.GaussianRandomWalk('mu', 
                              sigma=sigma * (1. - alpha), 
                              shape=len(y)
                              )
    likelihood = pm.Normal('sales', 
                      mu=mu, 
                      sigma=sigma * alpha, 
                      observed=y
                          )

Also, I am not sure how to scale this model to the entire dataset? Should I also try to build a hierarchical model here or use the Multivariate GRW class?

Btw, the SDE model is blazing fast with no divergences but, interestingly, the posterior dist samples look a lot noisier compared to GRW and the fit is not visually as good.

Thank you.

Topic		Replies	Views
Slow speed of NUTS in Hierarchical model Questions	14	2120	October 20, 2020
Improving model convergence and sampling speed Questions	8	6302	September 27, 2019
Poor inference results with hierarchical model v5 modeling	4	330	June 26, 2023
Sampler occasionally gets static in pymc3.1, but not in pymc3.0 Questions development	19	2201	July 5, 2018
My first (hierarchical) PyMC model is very slow v5	5	843	October 2, 2023

NUTS sampler slowing down on a hierarchical model

Related topics