I’ve got a dataset with about 4,000,000 observations, and about 22,000 entities. I’ve also got 5 categorical predictors. I’m trying to get the conversion rate (CVR) for each entity for any possible combination of the predictors.
When I sample 100 entities, and fit the model with ADVI, it takes about 7 minutes. When I try to run it on all the entities, the progress bar indicates it will take several hours (which if that was the case, I guess I could be okay with), but stalls after several minutes.
I’m trying to find potential solutions to this problem and I’ve come up with some:
- Minibatching: I’ve tried this, and it helps a lot, but doesn’t completely solve the problem. When it’s fitting on a large dataset, it still hangs somewhere before it’s done fitting.
- Fitting on a smaller split of the data that doesn’t hang, and then using the estimates of the parameters as a prior for training on the next split. One of problems here is I can’t seem to figure out how to include what I’ve learned about the Cholesky decomposition from one model fit to the next.
What would you guys to in my situation to tackle this problem?
with pm.Model(coords = coords) as model:
## A
chol_A, Rho_A, sigma_ent_A = pm.LKJCholeskyCov(
"chol_cov_A", n=coords["A"].shape[0], eta=2, sd_dist=pm.Exponential.dist(1.0), compute_corr=True,
)
b_A = pm.Normal("b_A", 0, 10, dims = "A") # prior for average slope
b_ent_A = pm.MvNormal(
"b_ent_A", mu=b_A, chol=chol_A, dims=("entity_id", "A")
)
A_effect = b_ent_A[entity_ids, A_ids]
## B
chol_B, Rho_B, sigma_ent_B = pm.LKJCholeskyCov(
"chol_cov_B", n=coords["B"].shape[0], eta=2, sd_dist=pm.Exponential.dist(1.0), compute_corr=True,
)
b_B = pm.Normal("b_B", 0, 10, dims = "B") # prior for average slope
b_ent_B = pm.MvNormal(
"b_ent_B", mu=b_B, chol=chol_B, dims=("entity_id", "B")
)
B_effect = b_ent_B[entity_ids, B_ids]
## C
...
## D
...
## E
...
## Const
sigma_const = pm.Exponential("sigma_const", 1.0)
b_const = pm.Normal("b_const", 0, 10)
b_ent_const = pm.Normal("b_ent_const", mu=b_const, sigma=sigma_const, dims = "entity_id")
const_effect = b_ent_const[entity_ids]
cvr = pm.Deterministic("cvr", pm.math.invlogit(A_effect + B_effect + C_effect + D_effect + E_effect + const_effect))
transactions = pm.Binomial("transactions", p = cvr, n = data.clicks.values, observed=data.transactions.values)
return model