Regarding likelihood evaluation in case of a blackbox likelihood

Hi there,

The demo has been very useful for me to understand how to use an external likelihood function. However, I’m a bit confused about the number of likelihood evaluations: if a Metropolis sampling is used, I expect the number of likelihood evaluations to be (almost) the same as the number of draws, as we need 1 evaluation per Markov step; However, in the attached minimal code, the number of likelihood evaluations is almost 4 times of draws. Could you please give a hint on why it’s like this?

Regards,
Charlie

The minimal (but complete) code is only slightly simplified from the demo so that the likelihood is evaluated within the Python environment, instead of Cython.

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('seaborn-darkgrid')
print('Running on PyMC3 v{}'.format(pm.__version__))

# for reproducibility here's some version info for modules used in this notebook
import theano
import theano.tensor as tt
import platform
# import cython
import IPython
import matplotlib
import os

# global variable to save number of likelihood evalutions
no_of_eva = 0

def my_model(theta, x):
  global no_of_eva
  no_of_eva += 1
  m, c = theta
  return m*x + c


# define your really-complicated likelihood function that uses loads of external codes
def my_loglike(theta, x, data, sigma):
    """
    A Gaussian log-likelihood function for a model with parameters given in theta
    """
    # print("called")

    model = my_model(theta, x)

    return -(0.5/sigma**2)*np.sum((data - model)**2)

class LogLike(tt.Op):

    """
    Specify what type of object will be passed and returned to the Op when it is
    called. In our case we will be passing it a vector of values (the parameters
    that define our model) and returning a single "scalar" value (the
    log-likelihood)
    """
    itypes = [tt.dvector] # expects a vector of parameter values when called
    otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)

    def __init__(self, loglike, data, x, sigma):
        """
        Initialise the Op with various things that our log-likelihood function
        requires. Below are the things that are needed in this particular
        example.

        Parameters
        ----------
        loglike:
            The log-likelihood (or whatever) function we've defined
        data:
            The "observed" data that our log-likelihood function takes in
        x:
            The dependent variable (aka 'x') that our model requires
        sigma:
            The noise standard deviation that our function requires.
        """

        # add inputs as class attributes
        self.likelihood = loglike
        self.data = data
        self.x = x
        self.sigma = sigma

    def perform(self, node, inputs, outputs):
        # the method that is used when calling the Op
        theta, = inputs  # this will contain my variables

        # call the log-likelihood function
        logl = self.likelihood(theta, self.x, self.data, self.sigma)

        outputs[0][0] = np.array(logl) # output the log-likelihood

import warnings

# set up our data
N = 10  # number of data points
sigma = 1.  # standard deviation of noise
x = np.linspace(0., 9., N)

mtrue = 0.4  # true gradient
ctrue = 3.   # true y-intercept

truemodel = my_model([mtrue, ctrue], x)

# make data
np.random.seed(716742)  # set random seed, so the data is reproducible each time
data = sigma*np.random.randn(N) + truemodel

ndraws = 3000  # number of draws from the distribution
nburn = 1000   # number of "burn-in points" (which we'll discard)

# # create our Op
logl = LogLike(my_loglike, data, x, sigma)

# use PyMC3 to sampler from log-likelihood
with pm.Model():
    # uniform priors on m and c
    m = pm.Uniform('m', lower=-10., upper=10.)
    c = pm.Uniform('c', lower=-10., upper=10.)

    # convert m and c to a tensor vector
    theta = tt.as_tensor_variable([m, c])

    # use a DensityDist (use a lamdba function to "call" the Op)
    pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})

    step = pm.Metropolis()

    trace = pm.sample(ndraws, tune=nburn, step = step, discard_tuned_samples=True, chains=1, cores=1)

# print number of likelihood evalutions
print("number of Markov steps (ndraws+nburn): %d"%(ndraws+nburn))
print("number of likelihood evaluations: %d"%no_of_eva)

# plot the traces
_ = pm.traceplot(trace, lines={'m': mtrue, 'c': ctrue})
plt.show()

Screen output:

number of Markov steps (ndraws+nburn): 4000
number of likelihood evaluations: 16004
1 Like

I don’t have an answer but I am intrigued since I often work with expensive likelihoods and would be worried about extra calls. I will note that the number of evaluations depends upon the argument to cores. Using one leads to additional calls relative to four cores.

I’m interested in this code example. I’ve been trying to run this exact example, but am coming up with an exception (for a slightly different call).

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Initializing NUTS failed. Falling back to elementwise auto-assignment.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>Slice: [c]
>Slice: [m]
Could not pickle model, sampling singlethreaded.
Sequential sampling (2 chains in 1 job)
CompoundStep
>Slice: [c]
>Slice: [m]

 100.00% [4000/4000 00:03<00:00 Sampling chain 0, 0 divergences]

 100.00% [4000/4000 00:03<00:00 Sampling chain 1, 0 divergences]
Sampling 2 chains for 1_000 tune and 3_000 draw iterations (2_000 + 6_000 draws total) took 9 seconds.
---------------------------------------------------------------------------
MissingInputError                         Traceback (most recent call last)
<ipython-input-8-2463a9902c60> in <module>
     72     # use a DensityDist (use a lamdba function to "call" the Op)
     73     pm.DensityDist('likelihood', lambda v: logl(v), observed={'v': theta})
---> 74     pm.sample(ndraws, tune=nburn, discard_tuned_samples=True, start={'m':0.4, 'c':3})

~/miniconda3/envs/dev3/lib/python3.8/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, return_inferencedata, idata_kwargs, **kwargs)
    597         if idata_kwargs:
    598             ikwargs.update(idata_kwargs)
--> 599         idata = arviz.from_pymc3(trace, **ikwargs)
    600 
    601     if compute_convergence_checks:

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in from_pymc3(trace, prior, posterior_predictive, log_likelihood, coords, dims, model, save_warmup)
    521     InferenceData
    522     """
--> 523     return PyMC3Converter(
    524         trace=trace,
    525         prior=prior,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in __init__(self, trace, prior, posterior_predictive, log_likelihood, predictions, coords, dims, model, save_warmup)
    157             self.dims = {**model_dims, **self.dims}
    158 
--> 159         self.observations, self.multi_observations = self.find_observations()
    160 
    161     def find_observations(self) -> Tuple[Optional[Dict[str, Var]], Optional[Dict[str, Var]]]:

~/miniconda3/envs/dev3/lib/python3.8/site-packages/arviz/data/io_pymc3.py in find_observations(self)
    170             elif hasattr(obs, "data"):
    171                 for key, val in obs.data.items():
--> 172                     multi_observations[key] = val.eval() if hasattr(val, "eval") else val
    173         return observations, multi_observations
    174 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/graph.py in eval(self, inputs_to_values)
    520         inputs = tuple(sorted(inputs_to_values.keys(), key=id))
    521         if inputs not in self._fn_cache:
--> 522             self._fn_cache[inputs] = theano.function(inputs, self)
    523         args = [inputs_to_values[param] for param in inputs]
    524 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function.py in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
    304         # note: pfunc will also call orig_function -- orig_function is
    305         #      a choke point that all compilation must pass through
--> 306         fn = pfunc(params=inputs,
    307                    outputs=outputs,
    308                    mode=mode,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/pfunc.py in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
    481         inputs.append(si)
    482 
--> 483     return orig_function(inputs, cloned_outputs, mode,
    484                          accept_inplace=accept_inplace, name=name,
    485                          profile=profile, on_unused_input=on_unused_input,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in orig_function(inputs, outputs, mode, accept_inplace, name, profile, on_unused_input, output_keys)
   1830     try:
   1831         Maker = getattr(mode, 'function_maker', FunctionMaker)
-> 1832         m = Maker(inputs,
   1833                   outputs,
   1834                   mode,

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in __init__(self, inputs, outputs, mode, accept_inplace, function_builder, profile, on_unused_input, fgraph, output_keys, name)
   1484             # make the fgraph (copies the graph, creates NEW INPUT AND
   1485             # OUTPUT VARIABLES)
-> 1486             fgraph, additional_outputs = std_fgraph(inputs, outputs,
   1487                                                     accept_inplace)
   1488             fgraph.profile = profile

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/compile/function_module.py in std_fgraph(input_specs, output_specs, accept_inplace)
    178     orig_outputs = [spec.variable for spec in output_specs] + updates
    179 
--> 180     fgraph = gof.fg.FunctionGraph(orig_inputs, orig_outputs,
    181                                   update_mapping=update_mapping)
    182 

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __init__(self, inputs, outputs, features, clone, update_mapping)
    173 
    174         for output in outputs:
--> 175             self.__import_r__(output, reason="init")
    176         for i, output in enumerate(outputs):
    177             output.clients.append(('output', i))

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __import_r__(self, variable, reason)
    344         # Imports the owners of the variables
    345         if variable.owner and variable.owner not in self.apply_nodes:
--> 346                 self.__import__(variable.owner, reason=reason)
    347         elif (variable.owner is None and
    348                 not isinstance(variable, graph.Constant) and

~/miniconda3/envs/dev3/lib/python3.8/site-packages/theano/gof/fg.py in __import__(self, apply_node, check, reason)
    389                                      "for more information on this error."
    390                                      % (node.inputs.index(r), str(node)))
--> 391                         raise MissingInputError(error_msg, variable=r)
    392 
    393         for node in new_nodes:

MissingInputError: Input 0 of the graph (indices start from 0), used to compute sigmoid(c_interval__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Just echoing @andyfaff comment here. I get this error on this example, but also on other examples where I have put my own Python function.

Not very good etiquette responding to myself, but the culprit for this issue is this line:

If instead of a pm.DensityDist you use pm.Potential, things should work as expected.

    pm.Potential("likelihood", logl(theta))

The reason behind this is found here, and you can find a full blown example in this stackoverflow answer.