Thanks for helping me to understand the tunning process of ADVI @junpenglao. I’m sorry that it took some time for me to reply back. I took a break from everything for few days (it is a festival season in my country
).
With this tuning I was able to get the accuracy close to MCMC, yet the ADVI took more than 40s where as MCMC converged within 2s. This is yet contradictory to our expectations since, ADVI is supposed to be faster than MCMC.
I wanted to investigate this statement and more importantly find out if the cause for the ridiculous time taken for the ADVI convergence is due to the differences in the scale of parameters. Therefore, I generated very simple dataset with single predictor variable in order to simulate a simple linear regression model to simplify the task.
These are the results,
100%|██████████| 3500/3500 [00:04<00:00, 749.04it/s]
Average Loss = 14.753: 100%|██████████| 10000/10000 [00:02<00:00, 3427.40it/s]
Inference : MCMC, wall time : 5882ms, mse: 0.256141
Inference : ADVI, wall time : 7087ms, mse: 0.255874
Inference : VI, wall time : 1ms, mse: 0.257132
Now, I have a problem with interpreting the results. In my 5th question, I wanted to know the different between time shown next to the ADVI progress bar, and the wall time measure by wrapping the pm.fit() with start and end time statements. I assumed that 2s next to progress bar is the actual time taken for stochastic optimization, whereas the wall time measured by me is the time taken to perform automatic differentiation + stochastic optimization + other overheads.