Using ADVI for GMM


I am trying to see how good is ADVI for my data generated from GMM. I got results for both using MCMC and ADVI but I am unsure about how to compare based on the results. Because ADVI result says

Average Loss = 324.97: 100%|██████████| 20000/20000 [00:16<00:00, 1202.46it/s]
Finished [100%]: Average Loss = 324.97

and using MCMC I get

Sequential sampling (1 chains in 1 job)
Metropolis: [pi_stickbreaking__]
Metropolis: [mu_1]
Metropolis: [mu_0]
100%|██████████| 20500/20500 [00:16<00:00, 1217.33it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks

Can someone please guide me on this?

The time taken by both in this case is pretty much the same but the loss shown is very high in ADVI and I suppose that is one of the disadvantage of using this. But the results are pretty much the same.

My results for ADVI are like this:

Couple of ways, since you are generating dataset from known parameters, the first step is to compare the estimation with the known value. Trace plot with lines=_dict_of_true_parameters_ kwarg comes in handy as you can compare directly.

A better way to quantify it is to do posterior predictive check. For example, you can “predict” what is the true latent label of each data point using the fitted GMM, then plot the labelled data. Some example could be seen in my WIP notebook (eg, cell 16)

Note that looking at the loss is uninformative. It only gives you some sense about whether your model converge or not (ie, by plotting the history)

Results for MCMC are :