Hi, I’m trying to use bayesian analysis in Kaggle’s uncertainty competition where the goal is to predict walmart sales of various items for different percentiles.
Using a subset of the data, with first shape parameter = 3049 and size of train = 715402 this model slows to a crawl (~3it/s)
with pm.Model() as model:
lam = pm.Exponential('lam', lam=1/train_mu, shape=ca_1.item_index.nunique())
pm.Poisson('obs', mu=lam[train['item_index'].values], observed=train['value'])
traces = pm.sample(1000, cores=1)
According to the FAQ, models are slow either because the gradient takes a long time to compute, or because it has to compute a lot of them. I think in my case it’s because it has to compute a lot of them. The recommended solution:
func = model.logp_dlogp_function(profile=True)
func.set_extra_values({})
x = np.random.randn(func.size)
%timeit func(x)
func.profile.summary()
Prints out a bunch of diagnostics I don’t really understand, with some theano suggestions of setting th.config.floatX = ‘float32’. I tried that and it didn’t really seem to do anything.
What are some strategies to handle this situation?