I would like to get some pointers for using ADVI Minibatch with large data sets (1-5M records). I wrote a hierarchical model that uses ADVI Minibatch and it runs pretty quickly when I use 20-50K records – at ~400 it/s.
However, if I try using the full data set, it slows down to ~5 it/s. I was under the impression that minibatch was just indexing my data set, so I would expect it to take longer in total (more iterations) but that the time per iteration wouldn’t be affected.
This is for doing BetaBinomial regression as you can see below in the code. X.shape ~ (n_records, n_features) with n_records ~1M and n_features ~20. There’s 30 groups (“clusters”) so len(set(cluster_idx)) = 30.
What am I missing?
def minibatch_beta_binomial_uvi(X, y, cluster_idx, batch_size=1500, model_name='default'): ''' Run variational inference for hierarchical bayesian logistic regression based on binomial-count data Params: X: Design matrix (n_records x n_features) y: Binomial response variable array([(n_0, k_0), (n_1, k_1), ... ]) cluster_idx: Array with the cluster index for each records batch_size: Optional batch size for Minibatching model_name: Optional name for the model ''' # Get successes (k) and trials (n) from y k = np.expand_dims(y[:,1], axis=1) n = y.sum(axis=1) n = np.expand_dims(n, axis=1) # Number of clusters n_cluster_0 = len(set(cluster_idx)) # Generate minibatches X_t = pm.Minibatch(X, batch_size=batch_size) k_t = pm.Minibatch(k, batch_size=batch_size) n_t = pm.Minibatch(n, batch_size=batch_size) cluster_idx_0_t = pm.Minibatch(cluster_idx, batch_size=batch_size) with pm.Model() as beta_binomial_model: # Intercept Priors sd_intr = pm.HalfCauchy('intr_sd', beta=2.5) mu_intr = pm.Cauchy(name='mu_intr', alpha=0, beta=5) b_intr = pm.Normal('b_intr', mu=mu_intr, sd=sd_intr, shape=(n_cluster_0, 1)) # Weights priors sigma_m = pm.HalfCauchy('sigma_m', beta=5, shape=(n_cluster_0, 1)) mu_m = pm.Normal('mu_m', mu=0, sd=5 ** 2, shape=(n_cluster_0, 1)) b_u = pm.Normal(name='b_u', mu=mu_m[cluster_idx_0_t], sd=sigma_m[cluster_idx_0_t], shape=(batch_size, X.shape) u_mu_arg = tt.reshape((X_t * b_u).sum(axis=1), (batch_size, 1)) # Logit transformation mu = pm.math.invlogit(b_intr[cluster_idx_0_t] + u_mu_arg) # Hyperpriors on Beta parameters # Here, the beta distribution is reparametrized by mu and kappa (the population mean and the "sample size" -proportional to inverse of the variance– respectively) kappa_log = pm.Exponential('kappa_log', lam=1.5) kappa = pm.Deterministic('kappa', tt.exp(kappa_log)) alpha = pm.Deterministic('alpha', mu*kappa) beta = pm.Deterministic('beta', (1-mu)*kappa) yobs_name = 'Y_obs_' + model_name pm.BetaBinomial(yobs_name, n=n_t, alpha=alpha, beta=beta, observed=k_t, total_size=X.shape) with beta_binomial_model: approx = pm.fit(10000, method='advi') return approx