Serializing BART models for out of sample data

brontidon · October 28, 2025, 8:34pm

Good Afternoon!

After training a BART model, I have had some difficulty saving and then loading the model in (a separate) notebook.

What I have tried:

Written a class that fits a BART model, computes diagnostics, saves idata and predicts on test data in Mlflow (within the same instance of the class). I am not using mlflow.pyfunc at this time, (mostly because I am still testing what I’d like mfllow to track)
Saved the model instance and idata using cloud pickle
loaded the idata and model in a seperate notebook
Created an out of sample data set and using pm.set_data, sampled from the posterior predictive of the model

To accomplish 4, I have set predictions = True in pm.sample_posterior_predictive, and have set my out of sample covariate frame to my ‘X’ variable, and for my 'Y ’ variable, I have just passed a dummy vector of length equal to my X variable, referencing this: Out of model predictions with PyMC and Categorical BART with Out of Sample Predictions

Performance is…inconsistent here, sometimes the load process breaks and other times it seems to work fine.

I have seen some alternative attempts to serialize the model, including GitHub - CDCgov/BART-Survival However, the methods utilized here to directly save and load the tree structure do not seem to exist.

Additionally, I have attempted to fit into a case where I load and sample from the trace, as in the radon example, but I am not sure where to start with extracting terminal nodes from trees.

Thank you very much for your time!

brontidon · October 28, 2025, 8:39pm

Apologies in advance for the preemptive @aloctavodia, but do you have any recommendations here? I’ve noticed you’ve done a lot of sweet work bringing BART to pymc and am not sure if there is a use case I have not seen on this forum yet that is similar to mine

mikesmith5446 · October 29, 2025, 12:28am

Have you tried this?

brontidon · October 29, 2025, 1:27am

Hi!

I’ve tried to incorporate this into my workflow (along with mlflow.pyfunc), but the main challenge thus far has been how to define this in a manner that allows me to store and call the tree structure (whch would be analogous to the mean variables in the example). I could be very wrong, but in order for me to use something like this I would need to extract all tree leafs first.

The bart survival package claims that you can do this with the model.f.owner method, but that seems to not exist in the current version of pymc-bart

Topic		Replies	Views
Save and Load a BART model v5 bart	8	758	August 22, 2024
Problem to load trained BART model for prediction v5 bart	5	835	December 6, 2023
How to save a bart model and load it without retraining？ v5 arviz , prediction , bart	1	143	August 24, 2024
How to save BART idata? version agnostic	3	538	January 24, 2023
How save PyMC v5 models? v5	10	3673	October 10, 2025

Serializing BART models for out of sample data

Related topics