How to use pm.Data with pm.Minibatch?

LLehner · February 4, 2025, 5:07pm

Hello,

I’m currently fitting a model and want to see predictions on unseen data that was not used in the model fitting process. I know that you can simply reset the “incoming” data before posterior predictive sampling using pm.set_data(new_data={...}), but this requires first defining data variables using pm.Data() when defining the initial model.

problem: I use VI for parameter estimation and would like to make use of minibatches. But if i define the minibatches first, and then turn them into data variables, which is required since I can’t reset them otherwise, I get a TypeError: Shared variable values can not be symbolic, when setting e.g. obs = pm.Data("obs", obs_minibatch).

This used to work in older versions of pymc as seen in this notebook.

This doesn’t seem to work anymore unfortunetely. What is the workaround now? Should I define a new identical model but with data variables instead of minibatches and then use posterior predictive sampling on the trace of the previous model as mentioned here or are there other ways?

ricardoV94 · February 11, 2025, 7:37am

Minibatch should happen on top of pm.Data not the other way around, I have no idea if that notebook actually ran or if people changed the code without re-running it.

CC @fonnesbeck

fonnesbeck · February 11, 2025, 8:48pm

That’s odd. I will fix it.

fonnesbeck · February 12, 2025, 10:27pm

The notebook is now fixed.

LLehner · February 12, 2025, 10:50pm

Thank you @ricardoV94 for the suggestion, now it works! And thank you @fonnesbeck for updating the notebook!

Maybe one additional question: If i use a minibatch size of e.g. 500 and want to do posterior predictive sampling on 1000 samples, it will still just sample as many times as the size of the minibatch. Is the only workaround again to define a new identical model without the minibatch and use the trace of the initial model to sample?

fonnesbeck · February 12, 2025, 11:48pm

Yes, exactly. The notebook now demonstrates this. There is an issue in the tracker for removing minibatches for doing predictive sampling, but for now you need to specify that prediction model.

Topic		Replies	Views
Reusing a Minibatch object for several models Questions	1	343	August 14, 2020
How to make Minibatch for multi-dimensional data? Questions	10	2455	September 17, 2020
Posterior sample from an approximation with Minibatches? Questions	9	921	February 6, 2018
Sampling after minibatch training Questions	1	1036	November 28, 2017
Getting Posterior Prediction of the data after using Minibatch v5	2	469	June 2, 2023

How to use pm.Data with pm.Minibatch?

Related topics