- I see a flat posterior distribution using ADVI, what is happening?
This is usually an indication that the gradient is zero for that random variable. It happens if some of the manipulation of the random variables breaks the smooth geometry property. Even if all your random variables are continuous, some operation will break the gradient, for example:
. casting a variable to int: c = a.astype('int32')
. using switch a = pm.math.switch(tau >= X, a1, a2)
Sometimes it even cause unexpected consequences, for example, poor initialization for NUTS (e.g., see here).
One of the workaround is reparametrization, for example, instead of using switch:
tau = pm.Uniform('tau', lower=0, upper=100)
a = pm.math.switch(tau >= X, a1, a2)
Appoximated the switch point with a Sigmoid function
tau = pm.Uniform('tau', lower=0, upper=100)
weight = tt.nnet.sigmoid(2 * (X - tau))
a = weight * a1 + (1 - weight) * a2