ADVI in PyMC3 3.2 is approximately two times slower than 3.1

Hi, new ADVI is my work. I suppose that step function is the main training function used for train iterations. OPVI has been redesigned several times. I think that the source of the problem is a lot of theano.clone operations, but without them It was not possible to make new things work.