Minibatch not working

Hi all,

It seems that pm.Minibatch may not be correctly handling batches of data and might be training the model on only one batch of data (making the size of the total training data equal to the batch size). This is only my suspicion (given the model outputs and warning message I’m receiving) as I am unsure what is happening in the backend.


Here are the codes to generate dummy data and to run the model using minibatch:

# generate data
N = 10000
P = 3
rng = np.random.default_rng(88)
X = rng.uniform(2, 10, size=(N, 3))
beta = np.array([1.5, 0.2, -0.9])
y = np.matmul(X, beta) + rng.normal(0, 1, size=(N,))

# minibatch
X_mb = pm.Minibatch(X, batch_size=100)
y_mb = pm.Minibatch(y, batch_size=100)

# model with minibatch
with pm.Model() as model_mb:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X_mb, b)
    likelihood = pm.Normal(
        "likelihood", mu=mu, sigma=sigma, observed=y_mb, total_size=N
    )

    fit_mb = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb = fit_mb.sample(500)

Output:

100.00% [100000/100000 00:34<00:00 Average Loss = 287.9]

Finished [100%]: Average Loss = 287.88

UserWarning: Could not extract data from symbolic observation likelihood warnings.warn(f"Could not extract data from symbolic observation {obs}")


Here are the codes for the same model using the full data and without minibatch:

# model no minibatch
with pm.Model() as model:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X, b)
    likelihood = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, total_size=N)

    fit = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata = fit.sample(500)

Output

100.00% [100000/100000 03:44<00:00 Average Loss = 14,196]


When comparing the posteriors of both models with the true beta parameter, the model using minibatch data performs poorly.

# compare models
fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 4), layout="constrained")
az.plot_posterior(
    idata_mb,
    var_names="b",
    ref_val=beta.tolist(),
    ax=ax[0, :],
    textsize=8,
)
az.plot_posterior(idata, var_names="b", ref_val=beta.tolist(), ax=ax[1, :], textsize=8)

for i in range(3):
    ax[1, i].set_xlim(ax[0, i].get_xlim())

ax[0, 0].annotate(
    text="Minibatch",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)
ax[1, 0].annotate(
    text="Full Data",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)


Any suggestions on how to overcome this issue?

Thanks

Additional info:

  • Package versions from the original post were pymc=3.11.0 and pytensor=2.18.6
  • Still getting the weird pm.Minibatch behaviours in pymc=3.12.0 and pytensor=2.19.0

Would it be better to submit an issue report in github?

Maybe @ferrine can spot the problem

@mcao @ricardoV94 @ferrine

I was facing the same problem.
I debugged it using your Example and I thought there were two bugs.
The first is that you are generating separate mini batches for X and Y. Fixing this problem would eliminate the following Warning.

UserWarning: RNG Variable RandomGeneratorSharedVariable(<Generator(PCG64) at 0x151ADB760>) has multiple clients. This is likely an inconsistent random graph.

The second point is that total_size is not working. The original assumption seems to be that total_size is an argument that scales logp as it is specified, but that does not appear to be working. Therefore, I thought I could solve this by manually calculating and scaling the logp.
I will share the code.

However, I too do not know if this is the perfect fix. If there are any experts out there, I would like to know what the solution is.

import numpy as np
import pymc as pm
import pytensor.tensor as pt
import matplotlib.pyplot as plt
import arviz as az

# generate data
N = 10000
P = 3
rng = np.random.default_rng(88)
X = rng.uniform(2, 10, size=(N, 3))
beta = np.array([1.5, 0.2, -0.9])
y = np.matmul(X, beta) + rng.normal(0, 1, size=(N,))

# minibatch
batch_size = 100
X_mb = pm.Minibatch(X, batch_size=batch_size)
y_mb = pm.Minibatch(y, batch_size=batch_size)

# model with minibatch
with pm.Model() as model_mb:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X_mb, b)
    likelihood = pm.Normal(
        "likelihood", mu=mu, sigma=sigma, observed=y_mb, total_size=N
    )

    fit_mb1 = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb1 = fit_mb1.sample(500)


# minibatch
batch_size = 100
X_mb, y_mb = pm.Minibatch(X, y, batch_size=batch_size)
#y_mb = pm.Minibatch(y, batch_size=100)

# model with minibatch
with pm.Model() as model_mb:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X_mb, b)
    likelihood = pm.Normal(
        "likelihood", mu=mu, sigma=sigma, observed=y_mb, total_size=N
    )

    fit_mb2 = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb2 = fit_mb2.sample(500)


# minibatch
batch_size = 100
X_mb, y_mb = pm.Minibatch(X, y, batch_size=batch_size)
#y_mb = pm.Minibatch(y, batch_size=100)

# model with minibatch
with pm.Model() as model_mb:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X_mb, b)
    pm.Potential("likelihood", (N/batch_size)*pm.logp(rv=pm.Normal.dist(mu=mu, sigma=sigma), value=y_mb))

    fit_mb3 = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb3 = fit_mb3.sample(500)


# model no minibatch
with pm.Model() as model:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X, b)
    likelihood = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y, total_size=N)

    fit = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata = fit.sample(500)


# compare models
fig, ax = plt.subplots(nrows=4, ncols=3, figsize=(10, 8), layout="constrained")
az.plot_posterior(
    idata_mb1,
    var_names="b",
    ref_val=beta.tolist(),
    ax=ax[0, :],
    textsize=8,
)
az.plot_posterior(
    idata_mb2,
    var_names="b",
    ref_val=beta.tolist(),
    ax=ax[1, :],
    textsize=8,
)
az.plot_posterior(
    idata_mb3,
    var_names="b",
    ref_val=beta.tolist(),
    ax=ax[2, :],
    textsize=8,
)
az.plot_posterior(idata, var_names="b", ref_val=beta.tolist(), ax=ax[3, :], textsize=8)

for i in range(3):
    ax[1, i].set_xlim(ax[0, i].get_xlim())

ax[0, 0].annotate(
    text="Minibatch1",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)
ax[1, 0].annotate(
    text="Minibatch2",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)
ax[2, 0].annotate(
    text="Minibatch3",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)
ax[3, 0].annotate(
    text="Full Data",
    xy=(-0.5, 0.5),
    xycoords="axes fraction",
    rotation=90,
    size=15,
    fontweight="bold",
    va="center",
)

1 Like

Minibatch 2 and 3 should be equivalent, and the HDI seem to be roughly the same, although the axis are messed up? Is it coming from odd samples far off?

1 Like

And yes, one needs to pass all the variables to a single Minibatch call so that their slices are identical, otherwise it would randomly pair some X with unrelated y

@ricardoV94
Thank you for your reply!
Oh, I missed that the results for Minibatch 2 and minibatch3 are the same.
Am I right in thinking that I need to specify the total_size when the size of the minibatch is not constant?

And yes, one needs to pass all the variables to a single Minibatch call so that their slices are identical, otherwise it would randomly pair some X with unrelated y

I think the Minibatch part of the following example is incorrect.

1 Like

Isn’t the size of the minibatch required to be constant? Anyway, you have to specify everytime you use minibatch, or else the model logp won’t be scaled to the full size.

Quite possible. Would you mind opening an issue in (and / or even fixing it as well?): GitHub - pymc-devs/pymc-examples: Examples of PyMC models, including a library of Jupyter notebooks.

2 Likes

Thank you!
I opened an issue.

1 Like

Thanks @ricardoV94, and thank you @Yamaguchi for running some tests and finding out what the problem was.

Minibatch 2 and the Full Data are equivalent when adjusting the axis.

But now I ran into another problem: the model doesn’t seem to be registering/storing observed variables afterwards. So, when using pm.sample_posterior_predictive, it only uses a subset of the data where the subset is the same size as the minibatch size. And, idata xarray doesn’t have the observed_data group.

# minibatch
X_mb, y_mb = pm.Minibatch(X, y, batch_size=100)

# model with minibatch
with pm.Model() as model_mb2:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X_mb, b)
    likelihood = pm.Normal(
        "likelihood", mu=mu, sigma=sigma, observed=y_mb, total_size=N
    )

    fit_mb2 = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb2 = fit_mb2.sample(500)


with model_mb2:
    pm.sample_posterior_predictive(idata_mb2, extend_inferencedata=True)

Giving this message:

UserWarning: Could not extract data from symbolic observation likelihood
  warnings.warn(f"Could not extract data from symbolic observation {obs}")
print("posterior_predictive dims:", dict(idata_mb2.posterior_predictive.dims))
print("posterior dims", dict(idata_mb2.posterior.dims))

posterior_predictive dims: {‘chain’: 1, ‘draw’: 500, ‘likelihood_dim_2’: 100}
posterior dims {‘chain’: 1, ‘draw’: 500, ‘b_dim_0’: 3}

So, a workaround for the time being for me is:

with pm.Model() as model_new:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pt.matmul(X, b)
    y_pred = pm.Normal("y_pred", mu=mu, sigma=sigma, observed=y, shape=y.shape)

    ppc_new = pm.sample_posterior_predictive(
        idata_mb,
        var_names=["y_pred"],
        predictions=False,
    )

I open up a separate issue on github related to the model not storing the observed data.

Thanks.

@mcao I am getting the same error for an ADVI approximation with mini batches, once I sample from the posterior approximation.

Could you link the relevant github issue (I wasn’t able to find it) so that I can follow. Thanks a lot!

Hi @Benjamin, thanks for the reminder; I’ve just opened an issue here: BUG: Minibatch posterior predictive sampling returns incorrect data size · Issue #7521 · pymc-devs/pymc · GitHub

What version are you using? I re-ran the scenario using the version below

pymc version: 5.17.0
pytensor version: 2.25.4

I didn’t get an error message, but sample_posterior_predictive continues to return prediction data with the minibatch sizes instead of the full dataset.

A possible workaround for now would be to get the idata inference data from the trained model with pm.Minibatch minibatch sampling as you normally would. Then, pass this idata onto a new but similar model for posterior sampling.

# generate data
N = 10000
P = 3
rng = np.random.default_rng(88)
X_full = rng.uniform(2, 10, size=(N, 3))
beta = np.array([1.5, 0.2, -0.9])
y_full = np.matmul(X_full, beta) + rng.normal(0, 1, size=(N,))

# minibatch
X_mb, y_mb = pm.Minibatch(X_full, y_full, batch_size=100)

# model with minibatch
with pm.Model() as model_mb:
    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pm.Deterministic("mu", pt.matmul(X_mb, b))
    likelihood = pm.Normal(
        "likelihood", mu=mu, sigma=sigma, observed=y_mb, total_size=N
    )

    fit_mb = pm.fit(
        n=100000,
        method="advi",
        progressbar=True,
        callbacks=[pm.callbacks.CheckParametersConvergence()],
        random_seed=88,
    )
    idata_mb = fit_mb.sample(500)

    pm.sample_posterior_predictive(idata_mb, extend_inferencedata=True)
    idata_mb.posterior = pm.compute_deterministics(
        idata_mb.posterior, merge_dataset=True
    )
with pm.Model() as model_preds:

    X = pm.Data("X", X_full)
    y = pm.Data("y", y_full)

    b = pm.Normal("b", mu=0, sigma=3, shape=(P,))
    sigma = pm.HalfCauchy("sigma", 1)
    mu = pm.Deterministic("mu", pt.matmul(X, b))
    likelihood = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=y)
with model_preds:
    pm.set_data({"X": X_full})
    ypreds = pm.sample_posterior_predictive(idata_mb)