Hello everyone,

I am trying to create a simple Bayesian network with two parents and a child. I initialize the parents as Categorical dists with Dirichlet priors. To generate the likelihood, I multiply the matrix of the parent distributions by a normal distribution of shape (2,3) where 2 represents the number of parents and 3 represents the number of child categories.

By taking this approach of likelihood-function-as-a-matrix, I hoped to create compatible shapes between the list of two parent distributions (1X2) and the normally distributed likelihood (2X3) to output a 1X3 matrix which can be used as the initial probability for the Categorical child node. For now I am not focusing on learning the likelihood distribution from any observed data and I’m just focusing on getting the shapes working right.

```
import pymc as pm
from aesara import *
with pm.Model() as model:
parent_num_cat = 3
parent1_prior = pm.Dirichlet("parent1_prior", a=[1]*parent_num_cat, shape=(parent_num_cat,))
parent1 = pm.Categorical("parent1", p=parent1_prior)
parent2_prior = pm.Dirichlet("parent2_prior", a=[1]*parent_num_cat, shape=(parent_num_cat,))
parent2 = pm.Categorical("parent2", p=parent2_prior)
parents = [parent1, parent2]
child_num_cat = 3
like_func = pm.Normal(f"likelihood_child", 0, 50, shape=(len(parents),child_num_cat))
mu = tensor.tensordot(tensor.transpose(parents), like_func, axes=1)
child = pm.Categorical(f"child", p=mu)
trace = pm.sample(100, cores=1)
```

However when running this code, the child variable is always initialized to -inf and the sampling fails.

```
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'parent1_prior_simplex__': array([0., 0.]), 'parent1': array(0), 'parent2_prior_simplex__': array([0., 0.]), 'parent2': array(0), 'likelihood_child': array([[0., 0., 0.],
[0., 0., 0.]]), 'child': array(0)}
Initial evaluation results:
{'parent1_prior': -1.5, 'parent1': -1.1, 'parent2_prior': -1.5, 'parent2': -1.1, 'likelihood_child': -28.99, 'child': -inf}
```

I tried fiddling with this in several different ways, and have observed that when the input to the child is a vector (i.e. [1,1,1]) the sampling works, and when it is a list of tensors it does not. I think it should be possible to use a matrix as a likelihood function between parents and a child, but I’m worried that I might be doing something fundamentally wrong here to cause this sampling failure.