Counting likelihood evaluations in pymc3

It seems there is no built-in way to count likelihood (and gradient) evaluations of a custom likelihood function.

@nouiz recommended a work-around wrapper in Theano:

all_counters = []

def logpdf(x):
    s = theano.shared(0, name='function_calls')
    all_counters.append(s)
    s.default_update = s + 1
    ll = actual_custom_likelihood(x)
    return ll + s * 0

pm.DensityDist('x', logpdf, shape=shape_of_x)
trace = pm.sample()

With metropolis this seems to work as expected. The shared variable s always seems to have 100 after 100 MCMC iterations. For NUTS there are more evaluations, but no time compute time penalty which seems odd. It also does not distinguish between regular evaluations and gradient evaluations.

Is there are more direct way to do this in pymc3? Is there a way to count evaluations and gradients separately?

Hi @rdturner, do you have more information about this use case?

There is trace.tree_size, which gives you the number of gradient/value evaluations per sample in nuts. It doesn’t typically make sense to think of the grad and value evaluations as different things, they involve a lot of the same ops. Usually, the value is also needed for the gradient anyway. If you want to be sure, you could put your likelihood in a custom op, there you can evaluate all the python code you want (including incrementing a global counter). See http://docs.pymc.io/advanced_theano.html#finding-the-root-of-a-function for an example about defining a custom op.

You can also use the theano profiling tools to see what is happening:

func = model.logp_dlogp_function(profile=True)
func.set_extra_values({})
x = np.random.randn(func.size)
for _ in range(100):
    func(x)

func.profile.summary() 

or so.

I tried following the example suggested by @aseyboldt with the following, but it doesn’t seem obvious how to get the desired behavior.

class CounterWrapper(tt.Op):
    itypes = [tt.dscalar]
    otypes = [tt.dscalar]

    def __init__(self):
        self.n_perform = 0
        self.n_grad = 0

    def get_n_perform(self):
        return self.n_perform

    def get_n_grad(self):
        return self.n_grad

    def perform(self, node, inputs, outputs):
        self.n_perform += 1
        x, = inputs
        outputs[0][0] = np.array(x)

    def grad(self, inputs, g):
        self.n_grad += 1
        return [g[0]]

counter_wrapper = CounterWrapper()

x = tt.vector()
y = tt.sum(x ** 2) + x[0]  # Some test function
z = counter_wrapper(y)
gz = tt.grad(z, x)

f1 = theano.function([x], z)
g1 = theano.function([x], gz)

It seems that n_perform correctly increments each time f1 is called, but n_grad is only the number of times T.grad() was called and not the number of times g1 is called.