How does pymc resolves vanishing gradient descent problem

I’m new to Bayesian statistics and pymc. I’m trying to use probabilistic based machine learning instead of usual one and I was wondering how does pymc resolves the vanishing gradient problem when we are fitting a data with time series (basically forecasting). I’m roughly following this Variational Inference: Bayesian Neural Networks — PyMC3 3.10.0 documentation