pm.Minibatch Doc string

DrEntropy · May 23, 2025, 10:57pm

I am (re)-learning to use ADVI with PYMC and was working through the example here : Introduction to Variational Inference with PyMC — PyMC example gallery
In that example, the authors show a rather extensive docstring for pm.Minibatch but the docstring for Minibatch now is just the anemic:

Get random slices from variables from the leading dimension.

    Parameters
    ----------
    variable: TensorVariable
    variables: TensorVariable
    batch_size: int

    Examples
    --------
    >>> data1 = np.random.randn(100, 10)
    >>> data2 = np.random.randn(100, 20)
    >>> mdata1, mdata2 = Minibatch(data1, data2, batch_size=10)

Is the original doc string incorrect? For example is this still correct : “Importantly, we need to make PyMC “aware” that a minibatch is being used in inference. Otherwise, we will get the wrong :math:logp for the model. the density of the model logp that is affected by Minibatch. See more in the examples below. To do so, we need to pass the total_size parameter to the observed node, which correctly scales the density of the model logp that is affected by Minibatch.”
If so is this documented now somewhere else?

jessegrabowski · May 24, 2025, 3:42am

Yes, you need to pass total_size, you should see a warning when you call pm.fit if you don’t.

I agree the docstring is pretty sad. PRs welcome

Topic		Replies	Views
How to set up minibatches on one specific dimension when variables have multiple and different dimensions v5 modeling	13	811	March 3, 2023
Minibatch not working v5 bug	11	356	October 2, 2024
How to make Minibatch for multi-dimensional data? Questions	10	2488	September 17, 2020
Minibatch when latent variable size depends on data dimension Questions	2	676	February 8, 2019
Verifying that minibatch is actually randomly sampling version agnostic	17	253	March 12, 2025

pm.Minibatch Doc string

Related topics