Structured Variational Inference


#1

Here I had some ideas for further VI improvements. They are in grouping latent variables. I see that more refactoring needs to be done. For example local variables support needs to be improved too. There we can try to set flexible shape for latent variable and in that case we can handle the case when local variables are modeled with Normalizing Flows or with other complex distribution. It needs great API improvements. I feel free in experimenting with VI module and do not shy some negligible API changes in 3.2


#2

@bwengals also suggested to introduce more flexible shape for RVs, is that what you meant by “flexible shape for latent variable”?


#3

[from github] The current idea is not possible. Because we need fixed latent dimensionality for for inference approaches MCMC and VI. Flexibility can be introduced when doing AEVB only.

I’ve meant local variables flexibility. We now fix their parametrization that can be somehow refactored


#4

I always thought that the local variables and the RVs are linked (e.g., here). How is it work to make the local variables have a flexible shape?


#5

A also want to make it possible to broadcast parametrization for local variables. Imagine network that predicts flow parameter for every observation. It is now not possible to make such approximation for AEVB that is of great interest. I now briefly imagine necessary information for GroupApproximation.

class GroupApproximation(object):
    def __init__(self, group=None, indices=None, shapes=None, params=None):

I want here to specify a group of variables, indices for necessary variables that are partially in the group, shapes with None held for flexible dim, and params that will be passed to approx in case of AEVB and broadcasted later along flexible dim. Params shape will have the following patern: flexible_shape + fixed_size


#6

Params is a dict mapping string name in shared_params to tensor variable with correct shape


#7

I see, yeah it should be possible, and will make a more flexible API.


#8

I see difficulties with taking indices in addition. Not sure it will be possible to reproduce the variable


#9

I couldn’t find a self-contained example of how to use the improved VI API (in particular, the Group class). I saw a blog post on @ferrine’s blog, the doc string and the discussion on github but it’s just fragments of information without a single complete example.

I would be particularly interested in a neural network approximation similar to what’s done in convolutional VAE or AEVB LDA (I guess both notebooks should be updated to use the new interface). For example, how should I define my_mu in the group approximation? Should I use it in a way similar to local_rvs? @ferrine, I imagine you tested this API on some models, could you perhaps post an example?


#10

After playing around and blindly trying out different combinations I can answer my own question now :smile:. If anybody else is wandering, in the convolutional VAE example you can use the groups with a minimal modification in the following way (and please correct me if I’m doing it wrong!):

# In memory Minibatches for better speed
xs_t_minibatch = pm.Minibatch(data, minibatch_size)

with model:
    group=pm.Group([zs],params=dict(mu=enc.means,rho=enc.rhos),local=True)

    approx = pm.Approximation([group])
    inference = pm.KLqp(approx=approx)

    approx = inference.fit(
        15000,
        more_obj_params=enc.params + dec.params, 
        obj_optimizer=pm.rmsprop(learning_rate=0.001),
        more_replacements={xs_t:xs_t_minibatch},
    )

In any case, a more complex use case for structured VI would be very helpful :slight_smile:.


#11

This is great @pwl! thanks for the update - PR welcome :wink:

I have been meant to take a deep dive into the Group approximation, starting with something like a Gaussian mixture with fullrank on some parameters and meanfield on the others (such structure has been used before in the literature). But I cant yet find the time to do that…