I’m running a model on a fairly large dataset. I’ve just taken a job with unlimited use of the google cloud platform and thought, if I chose a higher performing CPU and higher memory, the sampling would noticeably speed up.
I haven’t seen that. The data I’m sampling has around 500,000 observations. It’s a time series based on daily data for five years with multiple items forecasted per year.
I’m currently using a 56core, 112GB RAM setup. No GPU as I’ve actually never ran PyMC with numpyro/jax on a GPU. Will the GPU help?
Do any of you have suggestions to try?
For Reference, here is my model and versions of programs:
with pm.Model(coords = coords) as model: # item_idx = pm.Data('item_idx', items, dims = "obs_id", mutable = False) k = pm.Normal('k', 0, 1) m = pm.Normal('m', 0, 5) delta = pm.Laplace('delta', 0, 0.1, shape = n_changepoints) growth = k + at.dot(A, delta) offset = m + at.dot(A, -s * delta) trend = growth * t + offset # beta_weekly = pm.Normal('beta_weekly_seasonality', 0, 1, shape = weekly_n_components * 2) # seasonality_weekly = at.dot(fourier(t, p = 7), beta_weekly) # beta_monthly = pm.Normal('beta_monthly_seasonality', 0, 1, shape = monthly_n_components * 2) # seasonality_monthly = at.dot(fourier(t, p = 30.5), beta_monthly) # beta_yearly = pm.Normal('beta_yearly_seasonality', 0, 1, shape = yearly_n_components * 2) # seasonality_yearly = at.dot(fourier(t, p = 365.25), beta_yearly) error = pm.HalfCauchy('sigma', .5) pm.Normal("predicted_sales", trend, error, observed = train_y) trace = pymc.sampling_jax.sample_numpyro_nuts(tune=1000, chains = 4)
- PyMC/PyMC3 Version: 4.0.0b6
- Aesara/Theano Version: 2.5.1
- Python Version: 3.7.12
- Operating system: Debian via Google Cloud Platform
- How did you install PyMC/PyMC3: pip install pymc --pre (Installation Guide (Linux) · pymc-devs/pymc Wiki · GitHub)