Parameter data structure for model portability

I’m finding myself building PyMC models on a variety of problem statements with some overlap. A parameter (or set of parameters) which I estimated in one model may be useful in another model. Is there any best practice that folks have found for managing a library of PyMC models with some interrelation? My current thought is to dump a sampling of the trace into a dict with some appropriate metadata for that particular parameter set.

Another problem statement in this vein, is adding params to a hierarchical model. For instance, in a hierarchical model of gas mileage by auto manuf., I’ve trained the model and saved the trace, but then a new manuf. comes on the market and I want to estimate it’s params and update the hyperparam. I suppose I need a good schema to index the params and track which manuf. they are associated with, but maybe there are solutions to this problem already?

I’m a chemical engineer, writing scripts out of necessity more than training, so perhaps what I need is a good resource on data architecture? If so, any one have reco’s for an accessible intro to data architecture?


I’ve found that using xarray is a good lightweight solution. Arviz (the plotting library associated with PyMC3) already uses it as a default format for storing traces. As a plus, the underlying data storage format is netCDF which has good libraries for manipulation in R and other languages in case you need interoperability.

As a plus, you can store multiple datasets in a single xarray file. That way, you can just keep a single file for all of your sampled traces and easily load / update them.