Sampling a gaussian using ADVI

I’m trying to do a 'hello world" for the new ADVI interface.

This was my old code, which produced an almost exactly matching variational posterior

data = np.random.randn(100)
with pm.Model() as model: 
    mu = pm.Normal('mu', mu=0, sd=1, testval=0)
    sd = pm.HalfNormal('sd', sd=1)
    n = pm.Normal('n', mu=mu, sd=sd, observed=data)
advifit = pm.variational.advi( model=model, n=100000)
means, sds, elbo = advifit

In the new way, i create a ADVI object

advifit = pm.ADVI( model=model)
advifit.fit(n=10000)
advifit.approx.mean.eval(), advifit.approx.std.eval()

this gives me:
(array([-0.06538046, -0.03497334]), array([ 0.11616796, 0.08825936]))

is the first array the means of the variational approximations for mu and sd for my model? Or is there something going on with the parametrization (sd has a negative mean). And in general, given the advifit object, what is the officially sanctioned way of getting samples from it? I looked at the quickstart but landed up getting more confused, and the API docs dont seem to go into the Approximation objects.

Yes, but they are the approximation of the free parameters in the model. PyMC3 automatically transform the bounded parameters to the real line. In this case, the sd is only positive as it is halfnormal distributed, but for sampling and VI PyMC3 operates on the unbounded version of it.
You can check what are the parameters actually being sample/approximate by doing:

model.free_RVs
Out[4]: [mu, sd_log__]

You can do advifit.approx.sample(1000) which gives you a MCMC trace of 1000 iteration just like a trace returned from sampling.

Thus we are actually doing a normal on log(sd)? That would make the negative mean then a mean of log(sd), correct? That would make sense. I am confused by the rho=log(1+exp(s) parameter…is that part of this transformation?

Thanks for the “official way to get the samples”!!

You can find more information in the original paper https://arxiv.org/pdf/1603.00788.pdf basically you want the approximation parameter also on the real line so that you wont have a problem of a too large learning rate will push the sd invalid