Using Pytorch neural network as the custom “black box” likelihood function

gaoxuesong · June 11, 2025, 7:46am

Hi, as suggested by the topic, I am using nerual network model as the custom likelihood function. My data is from FEM simulations and a ANN neural network is trained based on the simulation data as a surrogate model. In the PYMC model, the surrogate model is wrapped in the custom likelihood function, as shown by the example of " Using a “black box” likelihood function" in the Gallery.

I think the logic is fine because the log probability is calcuated by the following code,

# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

However, I suspect that the random variables generated by pm.Uniform() cannot be fed into the neural network model due to data type mismatches.

My full code is as belows,

### define the neural network structure 
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(5, 50)
        self.fc2 = nn.Linear(50, 100)
        self.fc3 = nn.Linear(100, 50)
        self.fc4 = nn.Linear(50, 3)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = self.fc4(x)
        return x
mynet = Net()
## load the trained model parameters 
mynet.load_state_dict(torch.load('./trained_model.pth', weights_only=True))
mynet.eval()

def my_loglike(awid, bdep, cf, cr, eta, data):

    inputs = np.asarray([awid, bdep, cf, cr, eta], dtype=np.float32)
    inputs_tensor = torch.from_numpy(inputs).float()

    if len(inputs_tensor.shape) == 1:
        inputs_tensor = inputs_tensor.unsqueeze(0)
    
    model_output = mynet(inputs_tensor)
    model_output = model_output.detach().numpy().squeeze() 

    residuals = np.abs((data - model_output)/data)
    return np.log(residuals+1.0)

# define a pytensor Op for our likelihood function
class LogLike(Op):
    def make_node(self, awid, bdep, cf, cr, eta, data) -> Apply:
        awid = pt.as_tensor(awid)
        bdep = pt.as_tensor(bdep)
        cf = pt.as_tensor(cf)
        cr = pt.as_tensor(cr)
        eta = pt.as_tensor(eta)
        data = pt.as_tensor(data)

        inputs = [awid, bdep, cf, cr, eta, data]
        outputs = [data.type()]

        # Apply is an object that combines inputs, outputs and an Op (self)
        return Apply(self, inputs, outputs)

    def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
        awid, bdep, cf, cr, eta, data = inputs  # this will contain my variables

        # call our numpy log-likelihood function
        loglike_eval = my_loglike(awid, bdep, cf, cr, eta, data)
        outputs[0][0] = np.asarray(loglike_eval)

## data 
data = np.asarray([7.82e-02, 9.96e-02, 8.69e-02])
awid_true = 0.000000
bdep_true = 0.750000
cf_true = 0.000000
cr_true = 0.750000
eta_true = 0.666250
# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

def custom_dist_loglike(data, awid, bdep, cf, cr, eta):
    # data, or observed is always passed as the first input of CustomDist
    return loglike_op(awid, bdep, cf, cr, eta, data)

# use PyMC to sampler from log-likelihood
with pm.Model() as no_grad_model:
    # uniform priors 
    awid = pm.Uniform("awid", lower=0.0, upper=1.0, initval=0.5)
    bdep = pm.Uniform("bdep", lower=0.0, upper=1.0, initval=0.5)
    cf = pm.Uniform("cf", lower=0.0, upper=1.0, initval=0.5)
    cr = pm.Uniform("cr", lower=0.0, upper=1.0, initval=0.5)
    eta = pm.Uniform("eta", lower=0.0, upper=1.0, initval=0.5)

    # use a CustomDist with a custom logp function
    likelihood = pm.CustomDist(
        "likelihood", awid, bdep, cf, cr, eta, observed=data, logp=custom_dist_loglike
    )

ip = no_grad_model.initial_point()
print(no_grad_model.compile_logp(vars=[likelihood], sum=False)(ip))

A snippet of the error from the command terminal is,

  File "c:\Users\GTJCY\Documents\AITEST\V8\pymy_ML.py", line 22, in forward
    x = torch.relu(self.fc1(x))
                   ~~~~~~~~^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
           ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)
Apply node that caused the error: LogLike(Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, likelihood{[0.0782 0. ... 96 0.0869]})
Toposort index: 10
Inputs types: [TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(3,))]
Inputs shapes: [(1,), (1,), (1,), (1,), (1,), (3,)]
Inputs strides: [(8,), (8,), (8,), (8,), (8,), (8,)]
Inputs values: [array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.0782, 0.0996, 0.0869])]
Outputs clients: [[output[0](likelihood_logprob)]]

Thanks.

gaoxuesong · June 11, 2025, 2:25pm

Ok. I think I found the problem here. It is the data type difference. I notice in other posts the pytensor is suggested to use to create the neural network. But i don’t find any relavant examples. Please give a hint.

ricardoV94 · June 13, 2025, 5:02am

your torch function doesn’t seem to like the fact PyMC expands the scalar arguments to be 1d (shape=(1,) instead of shape=()). Try to squeeze them in the perform method before passing it to your torch function

gaoxuesong · June 13, 2025, 8:31am

It works. Thank you.
BTW, why the Pytorch model is compatible here but not in other posts, e.g., How to combine pymc with neural network - #2 by jessegrabowski

The worked code is as belows,

        processed_params = [param.squeeze() if param.shape == (1,) else param 
for param in inputs]
        awid, bdep, cf, cr, eta, data = processed_params

jessegrabowski · June 13, 2025, 12:16pm

You wrapped your NN in an Op, so it seems totally consistent with what I suggested in the post you linked. Am I missing something?

gaoxuesong · June 16, 2025, 6:23am

Hi. At first, I misunderstood the first option. I tried this option, but the result look bad. The effective sampling size is too small and the the four chains are inconvergent. I wondering if the result would get better if the second option of pytensor-defined-NN-model is used? Maybe due to the usuage of NUTS sampling method with a gradient?

ricardoV94 · June 16, 2025, 8:08am

You can also define a gradient Op that gives you the pytorch gradients, since pytorch is also auto-diffeable. It’s shown in the blackbox examples

gaoxuesong · June 19, 2025, 2:14pm

Hi. I get the error of “NotImplementedError: Gradient only implemented for scalar m and c” when I run the example with gradient. I think something is wrong about the definition of the LogLikeGrad(Op).
The default sample method is still Slice.

ricardoV94 · June 19, 2025, 3:00pm

Can you post the code you are actually running?

gaoxuesong · June 26, 2025, 12:45pm

Hi, sorry for the late response. I download the example from github at this site, pymc-examples/examples/howto/blackbox_external_likelihood_numpy.ipynb at main · pymc-devs/pymc-examples · GitHub

The error comes out as I run this cell,

The last error message is,
NotImplementedError: Gradient only implemented for scalar m and c

Thanks.

jessegrabowski · June 26, 2025, 1:23pm

Please post the entire traceback in a code block. Tracebacks are not spam, they are essential for debugging. You can make a codeblock by using triple backticks (this one ` 3 times) above and below a block of text, like this.

gaoxuesong · June 26, 2025, 1:56pm

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[30], line 1
----> 1 grad_model.compile_dlogp()(ip)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\model\core.py:620, in Model.compile_dlogp(self, vars, jacobian, **compile_kwargs)
    604 def compile_dlogp(
    605     self,
    606     vars: Variable | Sequence[Variable] | None = None,
    607     jacobian: bool = True,
    608     **compile_kwargs,
    609 ) -> PointFunc:
    610     """Compiled log probability density gradient function.
    611 
    612     Parameters
   (...)    618         Whether to include jacobian terms in logprob graph. Defaults to True.
    619     """
--> 620     return self.compile_fn(self.dlogp(vars=vars, jacobian=jacobian), **compile_kwargs)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\model\core.py:760, in Model.dlogp(self, vars, jacobian)
    758 cost = self.logp(jacobian=jacobian)
    759 cost = rewrite_pregrad(cost)
--> 760 return gradient(cost, value_vars)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:328, in gradient(f, vars)
    325     vars = cont_inputs(f)
    327 if vars:
--> 328     return pt.concatenate([gradient1(f, v) for v in vars], axis=0)
    329 else:
    330     return empty_gradient

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:317, in gradient1(f, v)
    315 def gradient1(f, v):
    316     """Flat gradient of f wrt v."""
--> 317     return pt.flatten(grad(f, v, disconnected_inputs="warn"))

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:747, in grad(cost, wrt, consider_constant, disconnected_inputs, add_names, known_grads, return_disconnected, null_gradients)
    744     if hasattr(g.type, "dtype"):
    745         assert g.type.dtype in pytensor.tensor.type.float_dtypes
--> 747 _rval: Sequence[Variable] = _populate_grad_dict(
    748     var_to_app_to_idx, grad_dict, _wrt, cost_name
    749 )
    751 rval: MutableSequence[Variable | None] = list(_rval)
    753 for i in range(len(_rval)):

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1541, in _populate_grad_dict(var_to_app_to_idx, grad_dict, wrt, cost_name)
   1538     # end if cache miss
   1539     return grad_dict[var]
-> 1541 rval = [access_grad_cache(elem) for elem in wrt]
   1543 return rval

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

    [... skipping similar frames: _populate_grad_dict.<locals>.access_grad_cache at line 1496 (2 times), _populate_grad_dict.<locals>.access_term_cache at line 1171 (1 times)]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1326, in _populate_grad_dict.<locals>.access_term_cache(node)
   1318         if o_shape != g_shape:
   1319             raise ValueError(
   1320                 "Got a gradient of shape "
   1321                 + str(o_shape)
   1322                 + " on an output of shape "
   1323                 + str(g_shape)
   1324             )
-> 1326 input_grads = node.op.L_op(inputs, node.outputs, new_output_grads)
   1328 if input_grads is None:
   1329     raise TypeError(
   1330         f"{node.op}.grad returned NoneType, expected iterable."
   1331     )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\graph\op.py:398, in Op.L_op(self, inputs, outputs, output_grads)
    371 def L_op(
    372     self,
    373     inputs: Sequence[Variable],
    374     outputs: Sequence[Variable],
    375     output_grads: Sequence[Variable],
    376 ) -> list[Variable]:
    377     r"""Construct a graph for the L-operator.
    378 
    379     The L-operator computes a row vector times the Jacobian.
   (...)    396 
    397     """
--> 398     return self.grad(inputs, output_grads)

Cell In[22], line 32, in LogLikeWithGrad.grad(self, inputs, g)
     30 # Our gradient expression assumes these are scalar parameters
     31 if m.type.ndim != 0 or c.type.ndim != 0:
---> 32     raise NotImplementedError("Gradient only implemented for scalar m and c")
     34 grad_wrt_m, grad_wrt_c = loglikegrad_op(m, c, sigma, x, data)
     36 # out_grad is a tensor of gradients of the Op outputs wrt to the function cost

NotImplementedError: Gradient only implemented for scalar m and c

jessegrabowski · June 26, 2025, 3:25pm

Are m and c scalars?

gaoxuesong · June 27, 2025, 12:21am

I just download this example and run it on my computer. I suspect there’s a mistamke in the code. pymc-examples/examples/howto/blackbox_external_likelihood_numpy.ipynb at main · pymc-devs/pymc-examples · GitHub

ricardoV94 · June 27, 2025, 6:16am

I see, the notebook probably has to be updated because the parameters now have an explicit expand_dims due to the observations being a vector

gaoxuesong · July 12, 2025, 1:45pm

Hi, I’m wondering if there’s someone assigned to handle this. This situation can be frustrating for beginners like me.

ricardoV94 · July 12, 2025, 2:05pm

No, but if you want to take a stab that would be great

gaoxuesong · July 13, 2025, 2:36am

Hi, I tried but failed. Still needs some help.

I get the error of “ValueError: LogLikeWithGrad.grad returned a term with 0 dimensions, but 1 are required.”

As defined by class LogLikeWithGrad(Op).grad(), the grad() method returns a vector. Why it returns a term with 0 dimension? I put a break point at the line just before the ‘return sentence’, all the variables are un-evaluated. For example, grad_wrt_m = LogLikeGrad.0, grad_wrt_m = LogLikeGrad.1. The perform() method of LogLikeGrad(Op) seems not to be executed.
Below is a section of the code,

The grad() method of LogLikeWithGrad(Op) is,

def grad(
    self, inputs: list[pt.TensorVariable], g: list[pt.TensorVariable]) -> list[pt.TensorVariable]:
    m, c, sigma, x, data = inputs
    grad_wrt_m, grad_wrt_c = loglikegrad_op(m, c, sigma, x, data)
    [out_grad] = g

   ## ### variable grad_wrt_m, grad_wrt_c are not evaluated before the return sentence. 
  ## ####  grad_wrt_m = LogLikeGrad.0, grad_wrt_m = LogLikeGrad.1 

    return [
        pt.sum(out_grad * grad_wrt_m),
        pt.sum(out_grad * grad_wrt_c),
        # (out_grad * grad_wrt_m),
        # (out_grad * grad_wrt_c),
        pytensor.gradient.grad_not_implemented(self, 2, sigma),
        pytensor.gradient.grad_not_implemented(self, 3, x),
        pytensor.gradient.grad_not_implemented(self, 4, data),
    ]

The grad() method of LogLikeGrad(Op) is,

def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
    m, c, sigma, x, data = inputs

    # calculate gradients
    grad_wrt_m, grad_wrt_c = finite_differences_loglike(m, c, sigma, x, data)

    outputs[0][0] = grad_wrt_m
    outputs[1][0] = grad_wrt_c

ricardoV94 · July 15, 2025, 10:42am

This returns a scalar, you should return something with the same shape of m (same for c). You can check m.type.ndim. The thing you return should have as many dimensions as that.

Topic		Replies	Views
A 'black box' likelihood function causes the error: Wrong number of dimensions: expected 0, got 1 with shape (10,) v3	2	603	April 29, 2022
Error in Black Box Likelihood Function Example with PyMC v5	4	108	May 29, 2025
How to use Neural Networks in PyMC	5	337	April 5, 2024
ODEs, PyMC4 and custom likelihood in jax Questions modeling	9	1532	April 10, 2022
Using pm.DensityDist and customized likelihood with a black-box function Questions	10	6808	August 29, 2018

Using Pytorch neural network as the custom “black box” likelihood function

Related topics