I have no insight of why exactly VI would gives such horrible result. I suspect it is something to do with multi modal as well (for component) or a local minima.
For what is worth, reduce the number of training epoch and changing to another optimizer might help:
[Edit]: this doesnt seems to be a good solution, see below post.
import pymc3 as pm
import numpy as np
import theano
import theano.tensor as tt
from pymc3.distributions.transforms import t_stick_breaking
np.random.seed(1)
nd = 10
sample = np.random.randint(0, 10000, nd)
def mix(components, decomp):
return tt.dot(decomp, tt.nnet.softmax(
tt.horizontal_stack(tt.zeros((nd, 1)), components)))
with pm.Model() as model:
decomp = pm.Dirichlet('decomp', np.ones(10), shape=(1, 10),
transform=t_stick_breaking(1e-9))
components = pm.Normal('components', shape=(nd, nd-1))
combined = pm.Deterministic('combined', mix(components, decomp))
obs = pm.Multinomial('obs', np.sum(sample), combined, observed=sample)
mean_field = pm.fit(method='advi', n=int(1e4), obj_optimizer=pm.adam(),
progressbar=False)
decomp = mean_field.bij.rmap(mean_field.mean.get_value())
print(theano.config.floatX)
print(t_stick_breaking(1e-9).backward(decomp['decomp_stickbreaking__']).eval())
Note I also did some refactoring to make sure softmax doesnt make the model unidentified.