And I also wonder what distribution the second line is sampling. I am confused because unlike ADVI, “approx” in SVGD should return a set of particles instead of a distribution.
Yes, @junpenglao is right. We form a kind of trace (empirical distribution) from what we sample particles independently. Thank you for pinging me in this thread. I’ve prepared a gist answering your question.
This is a bit toy example (I had few time), but the code is mostly model agnostic and shows potential problems of 2 approaches to do “model averaging”
Thank for the examples. I still have few questions:
we set the number of particles to be 100 in
approx = pm.fit(method=‘svgd’, inf_kwargs={‘n_particles’: 100}),
For model averaging:
Would trace = approx.sample(100) return exactly the values of those particles that I have trained?
But what mout1 = approx.sample_node(out, 10000).mean() is doing?
Drawing 10000 samples from a distribution then average them? so what distribution we are sampling here?
Furthermore, In Bayesian neural net, I want to average the prediction made by the particles(models) that I initialized in training, rather than used the particles to create another distribution from which I draw MC samples
Would trace = approx.sample(100) return exactly the values of those particles that I have trained?
No it will sample from particles uniformly
But what mout1 = approx.sample_node(out, 10000).mean() is doing?
It samples from the empirical distribution and makes appropriate replacements in the graph. So every Distribution node is replaced with it’s posterior distribution (symbolically).
I want to average the prediction made by the particles(models)
This was not implemented. Now I see that it might be useful to integrate over the posterior. The way you can integrate over the posterior looks like
# based mostly on https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/opvi.py#L1098
# or https://github.com/pymc-devs/pymc3/blob/master/pymc3/variational/opvi.py#L1470
def integrate_over_histogram(approx, node):
if not isinstance(approx, pm.Empirical):
raise ValueError('You need empirical distribution here, got {}'.format(type(approx)))
node = approx.to_flat_input(node)
def sample(post):
return theano.clone(node, {approx.input: post})
nodes, _ = theano.scan(sample, approx.histogram)
return nodes
# given the notebook above
Esin = integrate_over_histogram(approx, out).mean()
I think that integration of an expression should be implemented in future. What matters here is efficiency, given a chain of 10000 samples it is weird to average over them all. Some smart way of choosing important samples is needed.