Model with a lot of categories is very slow to sample

James_Vanneman · March 22, 2020, 8:07pm

Hi, I’m trying to use bayesian analysis in Kaggle’s uncertainty competition where the goal is to predict walmart sales of various items for different percentiles.

Using a subset of the data, with first shape parameter = 3049 and size of train = 715402 this model slows to a crawl (~3it/s)

with pm.Model() as model:
    lam = pm.Exponential('lam', lam=1/train_mu, shape=ca_1.item_index.nunique())
    pm.Poisson('obs',  mu=lam[train['item_index'].values], observed=train['value'])
    traces = pm.sample(1000, cores=1)

According to the FAQ, models are slow either because the gradient takes a long time to compute, or because it has to compute a lot of them. I think in my case it’s because it has to compute a lot of them. The recommended solution:

func = model.logp_dlogp_function(profile=True)
func.set_extra_values({})
x = np.random.randn(func.size)
%timeit func(x)

func.profile.summary()

Prints out a bunch of diagnostics I don’t really understand, with some theano suggestions of setting th.config.floatX = ‘float32’. I tried that and it didn’t really seem to do anything.

What are some strategies to handle this situation?

nkaimcaudle · March 24, 2020, 11:58pm

For that many observations you probably want to use ADVI with mini-batches

https://docs.pymc.io/notebooks/variational_api_quickstart.html

Topic		Replies	Views
Fast but now slow sample speeds (MacOS) Questions	2	990	February 1, 2021
Hierarchical Model - Slow Sampling Questions	4	1164	March 26, 2020
Very slow sampling. Use my new computer with AMD cpu Questions development_env	4	1299	December 25, 2021
Very long sampling time using simple model v5 sampling	1	45	October 30, 2024
Speeding up Bayesian Modelling - HelloFresh blogpost version agnostic modeling	2	702	September 11, 2022

Model with a lot of categories is very slow to sample

Related topics