Complaint Monday - What has been bothering you about PyMC?

Feel free to write any issues / bugs / missing features that have been bothering you.

Would love to hear from our users :slight_smile:

5 Likes

Saving then loading a trace and model so that you can make new out of sample predictions with a previously created model still feels awkward and clunky. And, given how common this practice this is, it seems like there should be an example notebook versus new users having to search the discourse for how to do this.

1 Like

What do you think of Using ModelBuilder class for deploying PyMC models — PyMC example gallery ?

It is being developed to try and answer the problem you mention.

3 Likes

Oh wow, I knew this was being developed but I was unaware that it had been added to pymc-experimental. This looks great, I’ll try it out.

Regarding my “complaint”, and getting philosophical for a second, I think pymc exists in this realm where it’s like you make these super amazing models that are highly descriptive and you infer and learn a lot about your data. But then it kind of stops there, predicting on new data is time consuming. Whereas machine learning exists more as like you’ll make these models, not really know why it works and won’t really learn anything about the data. But it’ll be really easy to predict on new data. So I think this functionality is going to be really powerful and helps bridge that “learn about your data” and “deploy on new data” gap.

2 Likes

As a preliminary remark, I love the PyMC framework, I have been using it for years and I think it’s an amazing project. That being said, I usually fit models on a server, and since I often work with pretty heavy likelihood functions & a lot of data the fitting time can easily exceed the maximum job length. This is a problem because there still is no way of interrupting sampling, saving the trace, and restarting from where it left off. McBackend is meant to do this but something is broken (see this issue). I imagine this is a fairly common problem for other users as well, so any improvement on this front would be awesome!

2 Likes

I think that libraries like blackjax make resuming sampling much easier because the samplers are written in a functional paradigm. @junpenglao may be able to confirm or deny.

I wonder if nutpie makes this easier as well? CC @aseyboldt

Our native samplers should also be able to do this but would require refactoring. Simple samplers like Metropolis and Slice should be easy (not too many tuning variables to keep track of). NUTS may be a tough beast though.

Yep you can use blackjax for that as it is written to be more modular. I think it wont even take that much of a refactoring for our native sampler to do so.

1 Like

Thanks for the help!

Unfortunately my model does not compile with any of those (nutpie, blackjax, numpyro) and I haven’t found a way to debug the problem. It likely has to do with some complicated shape issue / the scan operation. It runs fine with pm.sample and parameter recovery even worked well with a (much) smaller dataset than the one I have (though variational inference did not - the posterior for the last component of a dirichlet was systematically underestimated).