How to use pm.Data with pm.Minibatch?

Hello,

I’m currently fitting a model and want to see predictions on unseen data that was not used in the model fitting process. I know that you can simply reset the “incoming” data before posterior predictive sampling using pm.set_data(new_data={...}), but this requires first defining data variables using pm.Data() when defining the initial model.

problem: I use VI for parameter estimation and would like to make use of minibatches. But if i define the minibatches first, and then turn them into data variables, which is required since I can’t reset them otherwise, I get a TypeError: Shared variable values can not be symbolic, when setting e.g. obs = pm.Data("obs", obs_minibatch).

This used to work in older versions of pymc as seen in this notebook.

This doesn’t seem to work anymore unfortunetely. What is the workaround now? Should I define a new identical model but with data variables instead of minibatches and then use posterior predictive sampling on the trace of the previous model as mentioned here or are there other ways?

Minibatch should happen on top of pm.Data not the other way around, I have no idea if that notebook actually ran or if people changed the code without re-running it.

CC @fonnesbeck

1 Like

That’s odd. I will fix it.

The notebook is now fixed.

1 Like

Thank you @ricardoV94 for the suggestion, now it works! And thank you @fonnesbeck for updating the notebook!

Maybe one additional question: If i use a minibatch size of e.g. 500 and want to do posterior predictive sampling on 1000 samples, it will still just sample as many times as the size of the minibatch. Is the only workaround again to define a new identical model without the minibatch and use the trace of the initial model to sample?

Yes, exactly. The notebook now demonstrates this. There is an issue in the tracker for removing minibatches for doing predictive sampling, but for now you need to specify that prediction model.

1 Like