How to implement learning rate decay?

I am stuck in a scenario where the learning rate of adam for ADVI needs to decrease with the number of epochs. Towards convergence, the optimizer bounces around a lot resulting in very different results. I want to decrease the learning rate as the optimizer converges so that the bouncing behavior reduces and converges to a stable optimum. What would be the best way to implement this? Should I directly change the code in the library and then import it, or is there a better way?

Modified the PyMC’s Adam code and worked perfectly fine.

1 Like