Minibatch Giving Inf Loss

kylekatzen · July 21, 2024, 7:55pm

I’ve been trying to follow the Minibatching tutorial here: Introduction to Variational Inference with PyMC — PyMC example gallery

But when I modify it to 1. Use data containers and 2. adapt to my data, I get infinite losses and incorrect predictions when I use Minibatching. I’m hoping someone can help me. I must be doing something wrong, because Minibatching speeds up the fitting process even when the minibatch is 1, and if I increase the batch size, it seems to have no effect on how long it fits.

def sigmoid(z):
    return 1/(1 + np.exp(-z))

def generate_df(n_entities):
    const = -1.5
    combinations = np.random.randint(1, 20, size=n_entities)
    entity_ids = np.repeat(np.arange(n_entities), combinations)
    size = np.sum(combinations)

    entity_const_coefs = {i: x for i, x in enumerate(np.random.normal(0, 1, size=n_entities))}
    clicks = np.random.randint(1, 100, size=size)
    cvrs = []
    for ent in entity_ids:
        ent_const_coef = entity_const_coefs.get(ent)
        cvr = sigmoid(ent_const_coef+ const)
        cvrs.append(cvr)
    cvrs = np.array(cvrs)

    df = pd.DataFrame({"cvrs": cvrs, "entity_id": entity_ids, "clicks": clicks})
    df["transactions"] = df["cvrs"] * df["clicks"]
    df["transactions"] = df["transactions"].apply(np.floor)

    return df, entity_const_coefs

n_entities = 1000
df, entity_const_coefs = generate_df(n_entities)

np.random.seed(42)
coords = {
    "entity_id": np.arange(n_entities)
}

## First Version Without Minibatching. Works great, but is slow
with pm.Model(coords = coords) as model:
    clicks = pm.Data("clicks", df.clicks.values)
    transactions = pm.Data("transactions", df.transactions.values)
    entity_ids = pm.Data("entity_ids", df.entity_id.values)

    mu = pm.Normal("mu", 0, 1, dims="entity_id")
    const = pm.Normal("const", 0, 1) 
    cvr = pm.Deterministic("cvr", pm.math.invlogit(mu[entity_ids] + const))
    likelihood = pm.Binomial("likelihood", p=cvr, n=clicks, observed=transactions, total_size=df.shape[0])

    advi = pm.ADVI()
    advifit = advi.fit(10000)
    
with model:
    trace = advifit.sample(1000)
    posterior_predictive = pm.sample_posterior_predictive(trace, var_names = ["cvr"])
    
estimated_cvrs = posterior_predictive.posterior_predictive.cvr.mean(dim=("chain", "draw")).values
estimated_transactions = estimated_cvrs * df.clicks
print("R^2", r2_score(df.transactions, estimated_transactions)) # Usually around 99%

## Second Version with Minibatching. Oddly, I've set the batch size to 1, and it still goes much faster. Changing the batch size doesn't seem to affect how fast it goes.
with pm.Model(coords = coords) as model:
    clicks = pm.Data("clicks", df.clicks.values)
    transactions = pm.Data("transactions", df.transactions.values)
    entity_ids = pm.Data("entity_ids", df.entity_id.values)

    mu = pm.Normal("mu", 0, 1, dims="entity_id")
    const = pm.Normal("const", 0, 1) 
    cvr = pm.Deterministic("cvr", pm.math.invlogit(mu[entity_ids] + const))
    likelihood = pm.Binomial("likelihood", p=cvr, n=clicks, observed=transactions, total_size=df.shape[0])

    advi = pm.ADVI()
    
    batch_size = 1
    clicks_minibatch = pm.Minibatch(df.clicks.values.astype(np.int32), batch_size = batch_size)
    transactions_minibatch = pm.Minibatch(df.transactions.values.astype(np.float64), batch_size = batch_size)
    entity_ids_minibatch = pm.Minibatch(df.entity_id.values.astype(np.int32), batch_size = batch_size)
    
    advifit = advi.fit(10000, more_replacements = {clicks: clicks_minibatch, transactions: transactions_minibatch, entity_ids: entity_ids_minibatch})
    
with model:
    trace = advifit.sample(1000)
    posterior_predictive = pm.sample_posterior_predictive(trace, var_names = ["cvr"])
    
estimated_cvrs = posterior_predictive.posterior_predictive.cvr.mean(dim=("chain", "draw")).values
estimated_transactions = estimated_cvrs * df.clicks
print("R^2", r2_score(df.transactions, estimated_transactions)) ## Normally and R2 around 30%

kylekatzen · July 23, 2024, 3:05am

I figured out part of the problem, but now I’m stuck again. When I change the n in the Binomial likelihood to be some constant (that’s greater than any of my observed transactions), everything runs like a charm. The problem is that my n isn’t constant. It’s different for each observation. How would I get it to ‘understand’ that my n varies?

ricardoV94 · July 23, 2024, 4:26am

I don’t know why you’re using replacements. Can you definite your original model with minibatch already?

Anyway the biggest problem is you have to minibatch all the related variables together so the random slices are aligned. Looks like xb, yb = pm.Minibatch(a, b, batch_size=n)

kylekatzen · July 23, 2024, 9:44pm

Thank you for your reply. I minibatched all the related variables together and that seems to have fixed my problem. I find it odd though, because based on what I saw here: How to make Minibatch for multi-dimensional data? - #4 by ckrapu It seems like it was supposed to be faster to set them all separately. Granted, that was 4 years ago.

The reason I was using replacements is because when I would like to be able to use the pm.set_data function later to sample using new data, and it didn’t appear that was possible if I directly used the minibatched data in my likelihood function.

ricardoV94 · July 24, 2024, 8:38am

Yes the syntax changed from 4 years ago. Glad it is working now

Topic		Replies	Views
Model Not Recovering Parameters after Minibatching v5 modeling	3	27	September 9, 2024
Minibatch not working v5 bug	11	356	October 2, 2024
Using VI & minibatches with Bambi version agnostic bambi	4	639	April 20, 2023
Minibatch when latent variable size depends on data dimension Questions	2	676	February 8, 2019
ADVI Minibatch slows down with increasing size of data Questions	3	993	April 19, 2019

Minibatch Giving Inf Loss

Related topics