I have the following mixture of regressions model:
K = 3 #components
D = x_vals.shape[1] #dimensionality
X = theano.shared(x_vals)
Y = theano.shared(y)
with pm.Model() as model:
pi = pm.Dirichlet('pi', np.ones(K))
ws = [pm.Normal('w_%d'%i, mu=0, sd=1, shape=(D,)) for i in range(K)]
sigma = pm.Uniform('sigma', 0, 1)
mu = tt.stack([tt.dot(X,w) for w in ws], axis=1)
y_obs = pm.NormalMixture('y_obs', pi, mu, sd=sigma, observed=Y)
which I fit with ADVI with different number of minibatch size. What I find weird is that the smaller the minibatch size, the smaller number the ELBO seems to settle at. Almost halving every time I halve the minibatch size. The code for fitting is shown below.
batch_size = 64
X_m = pm.Minibatch(x_vals, batch_size)
Y_m = pm.Minibatch(y, batch_size)
with model:
approx = pm.fit(100000,
more_replacements={X:X_m, Y:Y_m},
callbacks=[pm.callbacks.CheckParametersConvergence(tolerance=1e-4)],
method='fullrank_advi')
Is there something I’m missing here? Maybe the mixture of regressions was a bad idea to begin with? Any thoughts?