Trace from Pymc3 being used in Pymc 4.0

I estimated posterior samples of many models with a large number of variables which I saved using arviz. Due to the large number of models and variables if possible I wouldn’t like to perform the posterior sampling again, however I saw that PyMC4 performs the posterior predictive faster that PyMC3 which would be very helpful for me. Can I use a trace estimated with PyMC3 to later on use it for a model formulated in PyMC4 for posterior predictive sampling?

Thank you!

1 Like

Yes, the origin of the InferenceData is not relevant at all. The only requirement is the variables in the posterior group of your InferenceData match the variable names in the pymc model you are using to sample from the posterior predictive. If that happens, the InferenceData could have been created with PyMC3 or even Turing in Julia. pm.sample_posterior_predictive will sample using the provided posterior

2 Likes

This is great! Thank you very much. I am very excited to use PyMC 4.0 it will definitely allow me to increase the efficiency of large scale projects

Regards

1 Like

One question, after running some tests sample_posterior_predictive from PyMC 4.0 runs too slow compared to fast_sample_posterior_predictive from PyMC3. The difference is around 8 seconds. Is this expected with the new version?

Can you share the model or some details about it?

1 Like

Is a Hirearchical model, using a StudentT distribution and for the formulation I followed a non-centered parameterization.

with pm.Model() as hmodel:

 # Hyper priors
  mu_a = pm.Normal('alpha', mu=0.0, sigma=0.5)
  sigma_a = pm.Exponential('sigma_alpha', lam=1)
  sigma_a2 = pm.Exponential('sigma_alpha2', lam=1)
  mu_b = pm.Normal("beta", mu=0.5, sigma=0.2)
  sigma_b = pm.Exponential('sigma_beta', lam=1)
  mu_b2 = pm.Normal("beta2", mu=0, sigma=0.5)
  sigma_b2 = pm.Exponential('sigma_beta2', lam=1)

  # Varying intercepts by each water year category
  za_wtr_yr = pm.Normal('za_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  za2_wtr_yr = pm.Normal('za2_wtr_yr', mu=0.0, sigma=1.0, shape=3)

#Varying slopes by each water year category
  zb_wtr_yr = pm.Normal('zb_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  zb2_wtr_yr = pm.Normal('zb2_wtr_yr', mu=0.0, sigma=1.0, shape=3)
  
  
  eps = pm.Exponential('eps', 1)
  
  id_wtr_yr = pm.Data("id_wtr_yr", id_wtr_yr_)
  id_wtr_yr_lag = pm.Data("id_wtr_yr_lag", id_wtr_yr_lag_)
  gw_pump_semi = pm.Data("gw_pump_semi", gw_pump_semi_)
  gw_pump_semi_lag = pm.Data("gw_pump_semi_lag", gw_pump_semi_lag_)
  
  depth_est = (mu_a + za_wtr_yr[id_wtr_yr] * sigma_a) + (za2_wtr_yr[id_wtr_yr_lag] * sigma_a2) + (mu_b + zb_wtr_yr[id_wtr_yr] * sigma_b) * gw_pump_semi + (mu_b2 + zb2_wtr_yr[id_wtr_yr_lag] * sigma_b2) * gw_pump_semi_lag
  
  nu = pm.Gamma("nu", alpha=2, beta=0.1)
  
  
  depth_like = pm.StudentT('depth_like', nu=nu, mu=depth_est, igma=eps,observed=df.std_depth_semi)

Later I use this model for frecast:

with hmodel:
pm.set_data({“gw_pump_semi”: [pump],
“gw_pump_semi_lag”: [pump_lag],
“id_wtr_yr_lag”: [wtr_yr_lag],
“id_wtr_yr”: [wtr_yr]})
p_post = pm.sample_posterior_predictive(trace=gwtrace,random_seed=800,var_names=[“depth_like”])

What is the size of your observations, and what is the extent of the slowdown you observe?

1 Like

The size is 57 observations, pm.sample_posterior_predictive in PyMC 4.0.0 takes 8 seconds in average and pm.fast_sample_posterior_predictive in PyMC3 takes 1.5 seconds

That difference could be the function compilation alone… You could try passing compile_kwargs={"mode": "FAST_COMPILE"} to sample.posterior_predictive. It’s also possible that your model is just slower than the old fast sample routine

1 Like

compile_kwargs is not defined in sample.posterior_predictive PyMC 4.0.0

<function pymc.sampling.sample_posterior_predictive(trace, samples: Optional[int] = None, model: Optional[pymc.model.Model] = None, var_names: Optional[List[str]] = None, size: Optional[int] = None, keep_size: Optional[bool] = None, random_seed=None, progressbar: bool = True, mode: Union[str, aesara.compile.mode.Mode, NoneType] = None, return_inferencedata: bool = True, extend_inferencedata: bool = False, predictions: bool = False, idata_kwargs: dict = None) → Union[arviz.data.inference_data.InferenceData, Dict[str, numpy.ndarray]]>

I guess I need to define it in mode: aesara.compile.mode.Mode

Using mode = “FAST_COMPILE” is 1 second slower running in 10 seconds now

I installed pymc using conda-forge in a new environment and is using aesara 2.6.6 , could it be a version issue?

Yes that should also work. I don’t think there is anything wrong on your side, it’s just a case where the new code is slower than what we had before in the fast_sample.

Also if you try to run several times, do you always see the same time difference? The first time (ever) should be a bit slower because it hasn’t cached the compilation yet.

I see, thank you! The first time is definitely slower those 9 seconds that I was mentioning are after the compilation was done and uses the model for forecast, I ran it for a couple of hundred runs and it was consistent with that time.