Using Pytorch neural network as the custom “black box” likelihood function

Hi, as suggested by the topic, I am using nerual network model as the custom likelihood function. My data is from FEM simulations and a ANN neural network is trained based on the simulation data as a surrogate model. In the PYMC model, the surrogate model is wrapped in the custom likelihood function, as shown by the example of " Using a “black box” likelihood function" in the Gallery.

I think the logic is fine because the log probability is calcuated by the following code,

# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

However, I suspect that the random variables generated by pm.Uniform() cannot be fed into the neural network model due to data type mismatches.

My full code is as belows,

### define the neural network structure 
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(5, 50)
        self.fc2 = nn.Linear(50, 100)
        self.fc3 = nn.Linear(100, 50)
        self.fc4 = nn.Linear(50, 3)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = torch.relu(self.fc3(x))
        x = self.fc4(x)
        return x
mynet = Net()
## load the trained model parameters 
mynet.load_state_dict(torch.load('./trained_model.pth', weights_only=True))
mynet.eval()

def my_loglike(awid, bdep, cf, cr, eta, data):

    inputs = np.asarray([awid, bdep, cf, cr, eta], dtype=np.float32)
    inputs_tensor = torch.from_numpy(inputs).float()

    if len(inputs_tensor.shape) == 1:
        inputs_tensor = inputs_tensor.unsqueeze(0)
    
    model_output = mynet(inputs_tensor)
    model_output = model_output.detach().numpy().squeeze() 

    residuals = np.abs((data - model_output)/data)
    return np.log(residuals+1.0)

# define a pytensor Op for our likelihood function
class LogLike(Op):
    def make_node(self, awid, bdep, cf, cr, eta, data) -> Apply:
        awid = pt.as_tensor(awid)
        bdep = pt.as_tensor(bdep)
        cf = pt.as_tensor(cf)
        cr = pt.as_tensor(cr)
        eta = pt.as_tensor(eta)
        data = pt.as_tensor(data)

        inputs = [awid, bdep, cf, cr, eta, data]
        outputs = [data.type()]

        # Apply is an object that combines inputs, outputs and an Op (self)
        return Apply(self, inputs, outputs)

    def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
        awid, bdep, cf, cr, eta, data = inputs  # this will contain my variables

        # call our numpy log-likelihood function
        loglike_eval = my_loglike(awid, bdep, cf, cr, eta, data)
        outputs[0][0] = np.asarray(loglike_eval)

## data 
data = np.asarray([7.82e-02, 9.96e-02, 8.69e-02])
awid_true = 0.000000
bdep_true = 0.750000
cf_true = 0.000000
cr_true = 0.750000
eta_true = 0.666250
# create our Op
loglike_op = LogLike()
test_out = loglike_op(awid_true, bdep_true, cf_true, cr_true, eta_true, data)
print(test_out.eval())

def custom_dist_loglike(data, awid, bdep, cf, cr, eta):
    # data, or observed is always passed as the first input of CustomDist
    return loglike_op(awid, bdep, cf, cr, eta, data)

# use PyMC to sampler from log-likelihood
with pm.Model() as no_grad_model:
    # uniform priors 
    awid = pm.Uniform("awid", lower=0.0, upper=1.0, initval=0.5)
    bdep = pm.Uniform("bdep", lower=0.0, upper=1.0, initval=0.5)
    cf = pm.Uniform("cf", lower=0.0, upper=1.0, initval=0.5)
    cr = pm.Uniform("cr", lower=0.0, upper=1.0, initval=0.5)
    eta = pm.Uniform("eta", lower=0.0, upper=1.0, initval=0.5)

    # use a CustomDist with a custom logp function
    likelihood = pm.CustomDist(
        "likelihood", awid, bdep, cf, cr, eta, observed=data, logp=custom_dist_loglike
    )

ip = no_grad_model.initial_point()
print(no_grad_model.compile_logp(vars=[likelihood], sum=False)(ip))

A snippet of the error from the command terminal is,

  File "c:\Users\GTJCY\Documents\AITEST\V8\pymy_ML.py", line 22, in forward
    x = torch.relu(self.fc1(x))
                   ~~~~~~~~^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\GTJCY\.conda\envs\pymc_env\Lib\site-packages\torch\nn\modules\linear.py", line 125, in forward
    return F.linear(input, self.weight, self.bias)
           ~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (5x1 and 5x50)
Apply node that caused the error: LogLike(Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, Sigmoid.0, likelihood{[0.0782 0. ... 96 0.0869]})
Toposort index: 10
Inputs types: [TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(1,)), TensorType(float64, shape=(3,))]
Inputs shapes: [(1,), (1,), (1,), (1,), (1,), (3,)]
Inputs strides: [(8,), (8,), (8,), (8,), (8,), (8,)]
Inputs values: [array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.5]), array([0.0782, 0.0996, 0.0869])]
Outputs clients: [[output[0](likelihood_logprob)]]

Thanks.

Ok. I think I found the problem here. It is the data type difference. I notice in other posts the pytensor is suggested to use to create the neural network. But i don’t find any relavant examples. Please give a hint.

your torch function doesn’t seem to like the fact PyMC expands the scalar arguments to be 1d (shape=(1,) instead of shape=()). Try to squeeze them in the perform method before passing it to your torch function

1 Like

It works. Thank you.
BTW, why the Pytorch model is compatible here but not in other posts, e.g., How to combine pymc with neural network - #2 by jessegrabowski

The worked code is as belows,

        processed_params = [param.squeeze() if param.shape == (1,) else param 
for param in inputs]
        awid, bdep, cf, cr, eta, data = processed_params        

You wrapped your NN in an Op, so it seems totally consistent with what I suggested in the post you linked. Am I missing something?

Hi. At first, I misunderstood the first option. I tried this option, but the result look bad. The effective sampling size is too small and the the four chains are inconvergent. I wondering if the result would get better if the second option of pytensor-defined-NN-model is used? Maybe due to the usuage of NUTS sampling method with a gradient?

You can also define a gradient Op that gives you the pytorch gradients, since pytorch is also auto-diffeable. It’s shown in the blackbox examples

1 Like

Hi. I get the error of “NotImplementedError: Gradient only implemented for scalar m and c” when I run the example with gradient. I think something is wrong about the definition of the LogLikeGrad(Op).
The default sample method is still Slice.

Can you post the code you are actually running?

Hi, sorry for the late response. I download the example from github at this site, pymc-examples/examples/howto/blackbox_external_likelihood_numpy.ipynb at main · pymc-devs/pymc-examples · GitHub

The error comes out as I run this cell,

The last error message is,
NotImplementedError: Gradient only implemented for scalar m and c

Thanks.

Please post the entire traceback in a code block. Tracebacks are not spam, they are essential for debugging. You can make a codeblock by using triple backticks (this one ` 3 times) above and below a block of text, like this.

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[30], line 1
----> 1 grad_model.compile_dlogp()(ip)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\model\core.py:620, in Model.compile_dlogp(self, vars, jacobian, **compile_kwargs)
    604 def compile_dlogp(
    605     self,
    606     vars: Variable | Sequence[Variable] | None = None,
    607     jacobian: bool = True,
    608     **compile_kwargs,
    609 ) -> PointFunc:
    610     """Compiled log probability density gradient function.
    611 
    612     Parameters
   (...)    618         Whether to include jacobian terms in logprob graph. Defaults to True.
    619     """
--> 620     return self.compile_fn(self.dlogp(vars=vars, jacobian=jacobian), **compile_kwargs)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\model\core.py:760, in Model.dlogp(self, vars, jacobian)
    758 cost = self.logp(jacobian=jacobian)
    759 cost = rewrite_pregrad(cost)
--> 760 return gradient(cost, value_vars)

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:328, in gradient(f, vars)
    325     vars = cont_inputs(f)
    327 if vars:
--> 328     return pt.concatenate([gradient1(f, v) for v in vars], axis=0)
    329 else:
    330     return empty_gradient

File ~\.conda\envs\pymc_env\Lib\site-packages\pymc\pytensorf.py:317, in gradient1(f, v)
    315 def gradient1(f, v):
    316     """Flat gradient of f wrt v."""
--> 317     return pt.flatten(grad(f, v, disconnected_inputs="warn"))

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:747, in grad(cost, wrt, consider_constant, disconnected_inputs, add_names, known_grads, return_disconnected, null_gradients)
    744     if hasattr(g.type, "dtype"):
    745         assert g.type.dtype in pytensor.tensor.type.float_dtypes
--> 747 _rval: Sequence[Variable] = _populate_grad_dict(
    748     var_to_app_to_idx, grad_dict, _wrt, cost_name
    749 )
    751 rval: MutableSequence[Variable | None] = list(_rval)
    753 for i in range(len(_rval)):

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1541, in _populate_grad_dict(var_to_app_to_idx, grad_dict, wrt, cost_name)
   1538     # end if cache miss
   1539     return grad_dict[var]
-> 1541 rval = [access_grad_cache(elem) for elem in wrt]
   1543 return rval

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

    [... skipping similar frames: _populate_grad_dict.<locals>.access_grad_cache at line 1496 (2 times), _populate_grad_dict.<locals>.access_term_cache at line 1171 (1 times)]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1171, in _populate_grad_dict.<locals>.access_term_cache(node)
   1168 if node not in term_dict:
   1169     inputs = node.inputs
-> 1171     output_grads = [access_grad_cache(var) for var in node.outputs]
   1173     # list of bools indicating if each output is connected to the cost
   1174     outputs_connected = [
   1175         not isinstance(g.type, DisconnectedType) for g in output_grads
   1176     ]

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1496, in _populate_grad_dict.<locals>.access_grad_cache(var)
   1494 for node in node_to_idx:
   1495     for idx in node_to_idx[node]:
-> 1496         term = access_term_cache(node)[idx]
   1498         if not isinstance(term, Variable):
   1499             raise TypeError(
   1500                 f"{node.op}.grad returned {type(term)}, expected"
   1501                 " Variable instance."
   1502             )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\gradient.py:1326, in _populate_grad_dict.<locals>.access_term_cache(node)
   1318         if o_shape != g_shape:
   1319             raise ValueError(
   1320                 "Got a gradient of shape "
   1321                 + str(o_shape)
   1322                 + " on an output of shape "
   1323                 + str(g_shape)
   1324             )
-> 1326 input_grads = node.op.L_op(inputs, node.outputs, new_output_grads)
   1328 if input_grads is None:
   1329     raise TypeError(
   1330         f"{node.op}.grad returned NoneType, expected iterable."
   1331     )

File ~\.conda\envs\pymc_env\Lib\site-packages\pytensor\graph\op.py:398, in Op.L_op(self, inputs, outputs, output_grads)
    371 def L_op(
    372     self,
    373     inputs: Sequence[Variable],
    374     outputs: Sequence[Variable],
    375     output_grads: Sequence[Variable],
    376 ) -> list[Variable]:
    377     r"""Construct a graph for the L-operator.
    378 
    379     The L-operator computes a row vector times the Jacobian.
   (...)    396 
    397     """
--> 398     return self.grad(inputs, output_grads)

Cell In[22], line 32, in LogLikeWithGrad.grad(self, inputs, g)
     30 # Our gradient expression assumes these are scalar parameters
     31 if m.type.ndim != 0 or c.type.ndim != 0:
---> 32     raise NotImplementedError("Gradient only implemented for scalar m and c")
     34 grad_wrt_m, grad_wrt_c = loglikegrad_op(m, c, sigma, x, data)
     36 # out_grad is a tensor of gradients of the Op outputs wrt to the function cost

NotImplementedError: Gradient only implemented for scalar m and c

Are m and c scalars?

I just download this example and run it on my computer. I suspect there’s a mistamke in the code. pymc-examples/examples/howto/blackbox_external_likelihood_numpy.ipynb at main · pymc-devs/pymc-examples · GitHub

I see, the notebook probably has to be updated because the parameters now have an explicit expand_dims due to the observations being a vector

Hi, I’m wondering if there’s someone assigned to handle this. This situation can be frustrating for beginners like me.

No, but if you want to take a stab that would be great

Hi, I tried but failed. Still needs some help.

I get the error of “ValueError: LogLikeWithGrad.grad returned a term with 0 dimensions, but 1 are required.”

As defined by class LogLikeWithGrad(Op).grad(), the grad() method returns a vector. Why it returns a term with 0 dimension? I put a break point at the line just before the ‘return sentence’, all the variables are un-evaluated. For example, grad_wrt_m = LogLikeGrad.0, grad_wrt_m = LogLikeGrad.1. The perform() method of LogLikeGrad(Op) seems not to be executed.
Below is a section of the code,

The grad() method of LogLikeWithGrad(Op) is,

def grad(
    self, inputs: list[pt.TensorVariable], g: list[pt.TensorVariable]) -> list[pt.TensorVariable]:
    m, c, sigma, x, data = inputs
    grad_wrt_m, grad_wrt_c = loglikegrad_op(m, c, sigma, x, data)
    [out_grad] = g

   ## ### variable grad_wrt_m, grad_wrt_c are not evaluated before the return sentence. 
  ## ####  grad_wrt_m = LogLikeGrad.0, grad_wrt_m = LogLikeGrad.1 

    return [
        pt.sum(out_grad * grad_wrt_m),
        pt.sum(out_grad * grad_wrt_c),
        # (out_grad * grad_wrt_m),
        # (out_grad * grad_wrt_c),
        pytensor.gradient.grad_not_implemented(self, 2, sigma),
        pytensor.gradient.grad_not_implemented(self, 3, x),
        pytensor.gradient.grad_not_implemented(self, 4, data),
    ]

The grad() method of LogLikeGrad(Op) is,

def perform(self, node: Apply, inputs: list[np.ndarray], outputs: list[list[None]]) -> None:
    m, c, sigma, x, data = inputs

    # calculate gradients
    grad_wrt_m, grad_wrt_c = finite_differences_loglike(m, c, sigma, x, data)

    outputs[0][0] = grad_wrt_m
    outputs[1][0] = grad_wrt_c

This returns a scalar, you should return something with the same shape of m (same for c). You can check m.type.ndim. The thing you return should have as many dimensions as that.