Average loss and MiniBatch size

I have the following mixture of regressions model:

K = 3 #components
D = x_vals.shape[1] #dimensionality

X = theano.shared(x_vals)
Y = theano.shared(y)
with pm.Model() as model:
    pi = pm.Dirichlet('pi', np.ones(K))
    ws = [pm.Normal('w_%d'%i, mu=0, sd=1, shape=(D,)) for i in range(K)]
    sigma  = pm.Uniform('sigma', 0, 1)
    mu = tt.stack([tt.dot(X,w) for w in ws], axis=1)
    y_obs = pm.NormalMixture('y_obs', pi, mu, sd=sigma, observed=Y)

which I fit with ADVI with different number of minibatch size. What I find weird is that the smaller the minibatch size, the smaller number the ELBO seems to settle at. Almost halving every time I halve the minibatch size. The code for fitting is shown below.

batch_size = 64
X_m = pm.Minibatch(x_vals, batch_size)
Y_m = pm.Minibatch(y, batch_size)
with model:
    approx = pm.fit(100000,
                    more_replacements={X:X_m, Y:Y_m},
                    callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)],
                    method='fullrank_advi')

Is there something I’m missing here? Maybe the mixture of regressions was a bad idea to begin with? Any thoughts?

Hmm, you did not specify total_size in your observed, I would imagine your logp (and as a consequence, the ELBO) is not scaled properly to allow you to compare across different batch size.

I’m not sure which function you are referring to when you say the parameter total_size. There’s about 20000 rows to deal with.

y_obs = pm.NormalMixture('y_obs', pi, mu, sd=sigma, observed=Y, total_size=len(y))

1 Like