Trace from Pymc3 being used in Pymc 4.0

josemrodriguezf · June 7, 2022, 4:27pm

I estimated posterior samples of many models with a large number of variables which I saved using arviz. Due to the large number of models and variables if possible I wouldn’t like to perform the posterior sampling again, however I saw that PyMC4 performs the posterior predictive faster that PyMC3 which would be very helpful for me. Can I use a trace estimated with PyMC3 to later on use it for a model formulated in PyMC4 for posterior predictive sampling?

Thank you!

OriolAbril · June 7, 2022, 4:37pm

Yes, the origin of the InferenceData is not relevant at all. The only requirement is the variables in the posterior group of your InferenceData match the variable names in the pymc model you are using to sample from the posterior predictive. If that happens, the InferenceData could have been created with PyMC3 or even Turing in Julia. pm.sample_posterior_predictive will sample using the provided posterior

josemrodriguezf · June 7, 2022, 4:41pm

This is great! Thank you very much. I am very excited to use PyMC 4.0 it will definitely allow me to increase the efficiency of large scale projects

Regards

josemrodriguezf · June 7, 2022, 11:14pm

One question, after running some tests sample_posterior_predictive from PyMC 4.0 runs too slow compared to fast_sample_posterior_predictive from PyMC3. The difference is around 8 seconds. Is this expected with the new version?

ricardoV94 · June 8, 2022, 6:06am

Can you share the model or some details about it?

josemrodriguezf · June 8, 2022, 4:31pm

Is a Hirearchical model, using a StudentT distribution and for the formulation I followed a non-centered parameterization.

with pm.Model() as hmodel:

 # Hyper priors
  mu_a = pm.Normal('alpha', mu=0.0, sigma=0.5)
  sigma_a = pm.Exponential('sigma_alpha', lam=1)
  sigma_a2 = pm.Exponential('sigma_alpha2', lam=1)
  mu_b = pm.Normal("beta", mu=0.5, sigma=0.2)
  sigma_b = pm.Exponential('sigma_beta', lam=1)
  mu_b2 = pm.Normal("beta2", mu=0, sigma=0.5)
  sigma_b2 = pm.Exponential('sigma_beta2', lam=1)

  # Varying intercepts by each water year category
  za_wtr_yr = pm.Normal('za_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  za2_wtr_yr = pm.Normal('za2_wtr_yr', mu=0.0, sigma=1.0, shape=3)

#Varying slopes by each water year category
  zb_wtr_yr = pm.Normal('zb_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  zb2_wtr_yr = pm.Normal('zb2_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  
  
  eps = pm.Exponential('eps', 1)
  
  id_wtr_yr = pm.Data("id_wtr_yr", id_wtr_yr_)
  id_wtr_yr_lag = pm.Data("id_wtr_yr_lag", id_wtr_yr_lag_)
  gw_pump_semi = pm.Data("gw_pump_semi", gw_pump_semi_)
  gw_pump_semi_lag = pm.Data("gw_pump_semi_lag", gw_pump_semi_lag_)
  
  depth_est = (mu_a + za_wtr_yr[id_wtr_yr] * sigma_a) + (za2_wtr_yr[id_wtr_yr_lag] * sigma_a2) + (mu_b + zb_wtr_yr[id_wtr_yr] * sigma_b) * gw_pump_semi + (mu_b2 + zb2_wtr_yr[id_wtr_yr_lag] * sigma_b2) * gw_pump_semi_lag
  
  nu = pm.Gamma("nu", alpha=2, beta=0.1)
  
  
  depth_like = pm.StudentT('depth_like', nu=nu, mu=depth_est, igma=eps,observed=df.std_depth_semi)

Later I use this model for frecast:

with hmodel:
pm.set_data({“gw_pump_semi”: [pump],
“gw_pump_semi_lag”: [pump_lag],
“id_wtr_yr_lag”: [wtr_yr_lag],
“id_wtr_yr”: [wtr_yr]})
p_post = pm.sample_posterior_predictive(trace=gwtrace,random_seed=800,var_names=[“depth_like”])

ricardoV94 · June 10, 2022, 7:04am

What is the size of your observations, and what is the extent of the slowdown you observe?

josemrodriguezf · June 10, 2022, 3:23pm

The size is 57 observations, pm.sample_posterior_predictive in PyMC 4.0.0 takes 8 seconds in average and pm.fast_sample_posterior_predictive in PyMC3 takes 1.5 seconds

ricardoV94 · June 10, 2022, 5:27pm

That difference could be the function compilation alone… You could try passing compile_kwargs={"mode": "FAST_COMPILE"} to sample.posterior_predictive. It’s also possible that your model is just slower than the old fast sample routine

josemrodriguezf · June 10, 2022, 5:42pm

compile_kwargs is not defined in sample.posterior_predictive PyMC 4.0.0

I guess I need to define it in mode: aesara.compile.mode.Mode

Using mode = “FAST_COMPILE” is 1 second slower running in 10 seconds now

I installed pymc using conda-forge in a new environment and is using aesara 2.6.6 , could it be a version issue?

ricardoV94 · June 10, 2022, 6:48pm

Yes that should also work. I don’t think there is anything wrong on your side, it’s just a case where the new code is slower than what we had before in the fast_sample.

Also if you try to run several times, do you always see the same time difference? The first time (ever) should be a bit slower because it hasn’t cached the compilation yet.

josemrodriguezf · June 10, 2022, 7:04pm

I see, thank you! The first time is definitely slower those 9 seconds that I was mentioning are after the compilation was done and uses the model for forecast, I ran it for a couple of hundred runs and it was consistent with that time.

Topic		Replies	Views
Sample_posterior_predictive() works fine in PyMC 3, raises exception in v4 v5	4	586	October 10, 2022
Are samples from the trace equivalent to samples from pm.sample_posterior_predictive? Questions	6	636	October 16, 2020
Deterministic posterior predictive?	5	540	November 14, 2023
How to make InferenceData returned by sample() aware of the prior and posterior_predictive Questions	3	819	September 8, 2021
Sample_posterior_predcictive() much slower than sample() version agnostic	2	317	May 11, 2023

Trace from Pymc3 being used in Pymc 4.0

Related topics