Defining a numeric (custom) likelihood function in PyMC3

(This is a question I asked over at SO but now I think it might be more suited for this forum, so I’m basically re-posting it here. Let me know if this is not advised)

After looking at several questions/answers and PyMC3’s, I’ve managed to create a minimal working example of my MCMC setup (see below).

My fitted parameters are continuous and discrete, so the priors are defined using `pm.Uniform` and `pm.DiscreteUniform` (with a re-scaling applied to the latter). My likelihood function is particularly convoluted (it involves comparing the N-dimensional histograms of my observed data and some synthetic data generated using the free parameters), so I had to write it using `theano`'s `@as_op` operator.

The implementation shown here works on a toy model working on random data, but in my actual model the likelihood and parameters are very similar.

My questions are:

1. Is this correct? Is there anything I should be doing different?
2. The call to the likelihood function is just thrown there apparently doing nothing and connected to nothing. Is this the proper way to do this?
3. I’m using `NUTS` for the continuous parameters but since my likelihood is numeric, I don’t think I should be able to do this. Since the code still runs, I’m nut sure what’s going on.

This is the first time I’ve used `PyMC3` so any pointers will be really helpful.

``````import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import theano.tensor as tt
from theano.compile.ops import as_op

def main():
trace = bayesMCMC()
print(pm.summary(trace))
pm.traceplot(trace)
plt.show()

def bayesMCMC():
"""
Define and process the full model.
"""
with pm.Model() as model:
# Define uniform priors.
A = pm.Uniform("A", lower=0., upper=5.)
B = pm.Uniform("B", lower=10., upper=20.)
C = pm.Uniform("C", lower=0., upper=1.)

# Define discrete priors.
minD, maxD, stepD = 0.005, 0.06, 0.005
ND = int((maxD - minD) / stepD)
D = pm.DiscreteUniform("D", 0., ND)
minE, maxE, stepE = 9., 10., 0.05
NE = int((maxE - minE) / stepE)
E = pm.DiscreteUniform("E", 0., NE)

# Is this correct??
logp(A, B, C, D, E)

step1 = pm.NUTS(vars=[A, B, C])
print("NUTS")
step2 = pm.Metropolis(vars=[D, E])
print("Metropolis")

trace = pm.sample(300, [step1, step2])  # , start)

return trace

@as_op(
itypes=[tt.dscalar, tt.dscalar, tt.dscalar, tt.lscalar, tt.lscalar],
otypes=[tt.dscalar])
def logp(A, B, C, D, E):
"""
Likelihood evaluation.
"""
# Get observed data and some extra info to re-scale the discrete parameters
obsData, minD, stepD, minE, stepE = obsservedData()

# Scale discrete parameters
D, E = D * stepD + minD, E * stepE + minE

# Generate synthetic data using the prior values
synthData = synthetic(A, B, C, D, E)

# Generate N-dimensional histograms for both data sets.
obsHist, edges = np.histogramdd(obsData)
synHist, _ = np.histogramdd(synthData, bins=edges)

# Flatten both histograms
obsHist_f, synHist_f = obsHist.ravel(), synHist.ravel()
# Remove all bins where N_bin=0.
binNzero = obsHist_f != 0
obsHist_f, synHist_f = obsHist_f[binNzero], synHist_f[binNzero]
# Assign small value to the 0 elements in synHist_f to avoid issues with
# the log()
synHist_f[synHist_f == 0] = 0.001

# Compare the histograms of the observed and synthetic data via a Poisson
# likelihood ratio.
lkl = -2. * np.sum(synHist_f - obsHist_f * np.log(synHist_f))

return lkl

def obsservedData():
"""Some 'observed' data."""
np.random.seed(12345)
N = 1000
obsData = np.random.uniform(0., 10., (N, 3))

minD, stepD = 0.005, 0.005
minE, stepE = 9., 0.05

return obsData, minD, stepD, minE, stepE

def synthetic(A, B, C, D, E):
"""
Dummy function to generate synthetic data. The actual function makes use
of the A, B, C, D, E variables (obviously).
"""
M = np.random.randint(100, 1000)
synthData = np.random.uniform(0., 10., (M, 3))
return synthData

if __name__ == "__main__":
main()``````

No it is not correct, as you suspect the logp is not connected to the graph, which means it is not being evaluated during sampling/inferece. Right now you are sampling from the prior. You can use potential to add the logp to the model:

``````with pm.Model() as model:
...
pm.Potential('logp', logp(A, B, C, D, E))

``````

It is allowed as pymc3 will sampled with a metropolis within Gibbs sampler (ie compoundstep), but you should just call `sample()` as Metropolis is actually not suitable for DiscreteUniform (it would give invalid proposal that goes out of bound). More information see http://docs.pymc.io/notebooks/sampling_compound_step.html, especially the last paragraph.

Lastly, I think you can rewrite your logp into theano. There is random generator in theano, and the histogram function could be rewrite into theano as well.

Thank you very much for your answer @junpenglao. I followed your advise and used `pm.Potential('logp', logp(A, B, C, D, E))` and `pm.sample()`.

I get a rather nasty error which I transcribe below:

``````\$ python del.py
Only 300 samples in chain.
Multiprocess sampling (2 chains in 2 jobs)
CompoundStep
>CompoundStep
>>Slice: [C]
>>Slice: [B]
>>Slice: [A]
>CompoundStep
>>Metropolis: [E]
>>Metropolis: [D]
Sampling 2 chains:   0%|                                                         | 0/1600 [00:00<?, ?draws/s]
pymc3.parallel_sampling.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/theano/compile/function_module.py", line 903, in __call__
self.fn() if output_subset is None else\
ValueError: expected an ndarray

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 73, in run
self._start_loop()
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 113, in _start_loop
point, stats = self._compute_point()
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 139, in _compute_point
point, stats = self._step_method.step(self._point)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/compound.py", line 29, in step
point = method.step(point)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/compound.py", line 33, in step
point = method.step(point)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py", line 142, in step
apoint = self.astep(bij.map(point), *inputs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/slicer.py", line 56, in astep
y = logp(q) - nr.standard_exponential()
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/blocking.py", line 257, in __call__
return self.fa(self.fb(x))
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/model.py", line 1107, in __call__
return self.f(**state)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/theano/compile/function_module.py", line 917, in __call__
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/theano/compile/function_module.py", line 903, in __call__
self.fn() if output_subset is None else\
ValueError: expected an ndarray
Apply node that caused the error: MakeVector{dtype='float64'}(__logp_A_interval__, __logp_B_interval__, __logp_C_interval__, __logp_D, __logp_E, logp)
Toposort index: 9
Inputs types: [TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar)]
Inputs shapes: [(), (), (), (), (), ()]
Inputs strides: [(), (), (), (), (), ()]
Inputs values: [array(-1.38629436), array(-1.38629436), array(-1.38629436), array(-2.48490665), array(-3.04452244), -7993.053447180414]
Outputs clients: [[Sum{acc_dtype=float64}(MakeVector{dtype='float64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "del.py", line 38, in bayesMCMC
trace = pm.sample(300)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 415, in sample
step = assign_step_methods(model, step, step_kwargs=step_kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 151, in assign_step_methods
return instantiate_steppers(model, steps, selected_steps, step_kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 72, in instantiate_steppers
step = step_class(vars=vars, **args)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py", line 65, in __new__
step.__init__([var], *args, **kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/slicer.py", line 47, in __init__
super(Slice, self).__init__(vars, [self.model.fastlogp], **kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/model.py", line 233, in fastlogp
return self.model.fastfn(self.logpt)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/model.py", line 716, in logpt
logp = tt.sum([tt.sum(factor) for factor in factors])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
"""

The above exception was the direct cause of the following exception:

ValueError: expected an ndarray
Apply node that caused the error: MakeVector{dtype='float64'}(__logp_A_interval__, __logp_B_interval__, __logp_C_interval__, __logp_D, __logp_E, logp)
Toposort index: 9
Inputs types: [TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar), TensorType(float64, scalar)]
Inputs shapes: [(), (), (), (), (), ()]
Inputs strides: [(), (), (), (), (), ()]
Inputs values: [array(-1.38629436), array(-1.38629436), array(-1.38629436), array(-2.48490665), array(-3.04452244), -7993.053447180414]
Outputs clients: [[Sum{acc_dtype=float64}(MakeVector{dtype='float64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "del.py", line 38, in bayesMCMC
trace = pm.sample(300)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 415, in sample
step = assign_step_methods(model, step, step_kwargs=step_kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 151, in assign_step_methods
return instantiate_steppers(model, steps, selected_steps, step_kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 72, in instantiate_steppers
step = step_class(vars=vars, **args)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py", line 65, in __new__
step.__init__([var], *args, **kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/step_methods/slicer.py", line 47, in __init__
super(Slice, self).__init__(vars, [self.model.fastlogp], **kwargs)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/model.py", line 233, in fastlogp
return self.model.fastfn(self.logpt)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/model.py", line 716, in logpt
logp = tt.sum([tt.sum(factor) for factor in factors])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "del.py", line 102, in <module>
main()
File "del.py", line 12, in main
trace = bayesMCMC()
File "del.py", line 38, in bayesMCMC
trace = pm.sample(300)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 449, in sample
trace = _mp_sample(**sample_args)
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/sampling.py", line 999, in _mp_sample
for draw in sampler:
File "/home/gabriel/anaconda3/envs/asteca3/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 305, in __iter__