NUTS can sample even failing to differentiate?

Jan · September 21, 2018, 7:56pm

This model learns a optimal value for x.
However, we can not solve the derivative of s wrt. x. I show another Tensorflow example below to verify this. I am more familiar with TF so I use it instead of Theano, assuming they are equally powerful in automatic differentiation.

NUTS is a variant of HMC, and thus needing the first order derivative to guide the sampling.
So why can it still work?

Example 1:

with pm.Model() as model:
    x = pm.Normal('x', mu=0, sd=3)
    
    a = theano.shared(np.array([1.,2.,3.,4.,5.]))
    b = a + x
    c = a[(b > 5)]
    s = tt.sum(c)
    
    sigma = pm.HalfCauchy('sigma', 2.5)
    y_obs = pm.Normal("s", mu=s, sd=sigma, observed=15)
    
    trace = pm.sample(500)

Example 2:

a = tf.constant([1.,2.,3.,4.,5.])
x = tf.constant([3.])
# x = tf.get_variable("x1", initializer=tf.constant([3.]))
b = a + x
c = tf.boolean_mask(a, b>5)
s = tf.reduce_sum(c)
ga = tf.gradients(s, [a])
gx = tf.gradients(s, [x])

with tf.Session() as sess:
    print(sess.run(s))
    print(sess.run(ga))
    print(sess.run(gx)) # This fail, same for a variable x.

junpenglao · September 21, 2018, 8:53pm

Stealing part from Dave’s comment from a TFP issue here:
The way you’ve written the model with a continuous prior, all RVs are technically speaking a continuous variable, and it seems to work because HMC resorts to random-walk behavior in the absence of gradients. So while there is no gradient of s re x, the sampler following the initial random momentum in each HMC step to essentially get some kind of discrete MH proposal.

Jan · September 21, 2018, 9:03pm

I see. HMC degenerates to MH in the absence of grad.
Thank you very much, @junpenglao !

junpenglao · September 22, 2018, 6:49am

Actually, my answer is not correct.
You can take gradient re x:

So my guess is that the posterior is piece-wise differentiated, as long as the sampler or leapfrog step does not land exactly on the discontinuous point in the posterior parameter space, everything should works fine.
As for the tensorflow example, since sampling and gradient are done using the log-likelihood as target, you might want to replicate it similarly in tensorflow as well.

junpenglao · September 22, 2018, 7:23am

Something like:

import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

tf.reset_default_graph()
a = tf.constant([1., 2., 3., 4., 5.])
x = tf.get_variable("x", initializer=tf.constant(3.))
sigma = tf.get_variable("sigma", initializer=tf.constant(0.1))

b = a + x
c = tf.boolean_mask(a, b > 5)
s = tf.reduce_sum(c)

loglike = tfd.Normal(loc=s, scale=sigma).log_prob(15.) + tfd.HalfCauchy(
    loc=0., scale=2.5).log_prob(sigma) + tfd.Normal(loc=0., scale=3.).log_prob(x)

ga = tf.gradients(loglike, [a])
gx = tf.gradients(loglike, [x])

with tf.Session() as sess:
    init = tf.initialize_all_variables()
    sess.run(init)
    print(sess.run(s))
    print(sess.run(ga))
    print(sess.run(gx))

Jan · September 23, 2018, 8:14am

Thank you so much for your detailed further investigation, Junpeng.
As I see in your TF example, grad of x really comes from only its prior, which forces samplers to sample more x around zero. However, in its nature, it is still doing model selection in terms of x.

Thanks again for your help! Really learn a lot of useful skills from you.

junpenglao · September 23, 2018, 8:27am

That’s a great observation! It would be interesting to run some experiment to see how much it effects the efficiency of HMC sampling.

Topic		Replies	Views
Auto-diff when using NUTS sampler with blackbox likelihoods Questions development	1	831	September 11, 2020
Is there a way to call NUTS directly to sample a customized likelihood together with its gradient? Questions	2	752	March 13, 2020
Can NUTS work with stochastic gradient / mini-batch? Questions	1	869	May 2, 2019
NUTS sampling performance when using tensor.dot operator Questions	2	569	May 26, 2019
Newton Method with Random Variables Questions theano	2	684	March 13, 2022

NUTS can sample even failing to differentiate?

Related topics