Make predictions with the particles of SVGD


#1

Hi,

In pymc3, how can I make predictions by averaging the particles (models) if I use Stein variational gradient descent for Bayesian neural nets?

In the tutorial, I found the code for GMM:

approx = pm.fit(method=pm.SVGD(n_particles=200, jitter=1.))
trace = approx.sample(10000)

And I also wonder what distribution the second line is sampling. I am confused because unlike ADVI, “approx” in SVGD should return a set of particles instead of a distribution.

best,
Rob


#2

I think it works like using a trace to make prediction. @ferrine?


#3

Yes, @junpenglao is right. We form a kind of trace (empirical distribution) from what we sample particles independently. Thank you for pinging me in this thread. I’ve prepared a gist answering your question.

This is a bit toy example (I had few time), but the code is mostly model agnostic and shows potential problems of 2 approaches to do “model averaging”


#4

If you still have any question (maybe on internals), I’m happy to answer


#5

Thank for the examples. I still have few questions:

we set the number of particles to be 100 in
approx = pm.fit(method=‘svgd’, inf_kwargs={‘n_particles’: 100}),
For model averaging:
Would trace = approx.sample(100) return exactly the values of those particles that I have trained?

But what mout1 = approx.sample_node(out, 10000).mean() is doing?
Drawing 10000 samples from a distribution then average them? so what distribution we are sampling here?

Furthermore, In Bayesian neural net, I want to average the prediction made by the particles(models) that I initialized in training, rather than used the particles to create another distribution from which I draw MC samples


#6

Would trace = approx.sample(100) return exactly the values of those particles that I have trained?

No it will sample from particles uniformly

But what mout1 = approx.sample_node(out, 10000).mean() is doing?

It samples from the empirical distribution and makes appropriate replacements in the graph. So every Distribution node is replaced with it’s posterior distribution (symbolically).

I want to average the prediction made by the particles(models)

This was not implemented. Now I see that it might be useful to integrate over the posterior. The way you can integrate over the posterior looks like

# based mostly on https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/opvi.py#L1098
# or https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/opvi.py#L1470
def integrate_over_histogram(approx, node):
    if not isinstance(approx, pm.Empirical):
        raise ValueError('You need empirical distribution here, got {}'.format(type(approx)))
    node = approx.to_flat_input(node)
    def sample(post):
        return theano.clone(node, {approx.input: post})
    nodes, _ = theano.scan(sample, approx.histogram)
    return nodes
# given the notebook above
Esin = integrate_over_histogram(approx, out).mean()

#7

I think that integration of an expression should be implemented in future. What matters here is efficiency, given a chain of 10000 samples it is weird to average over them all. Some smart way of choosing important samples is needed.

UPDATE: added initial implementation here https://github.com/pymc-devs/pymc3/pull/3244