Initial evaluation of child variable fails with -inf

Hello everyone,

I am trying to create a simple Bayesian network with two parents and a child. I initialize the parents as Categorical dists with Dirichlet priors. To generate the likelihood, I multiply the matrix of the parent distributions by a normal distribution of shape (2,3) where 2 represents the number of parents and 3 represents the number of child categories.

By taking this approach of likelihood-function-as-a-matrix, I hoped to create compatible shapes between the list of two parent distributions (1X2) and the normally distributed likelihood (2X3) to output a 1X3 matrix which can be used as the initial probability for the Categorical child node. For now I am not focusing on learning the likelihood distribution from any observed data and I’m just focusing on getting the shapes working right.

import pymc as pm
from aesara import *

with pm.Model() as model:

    parent_num_cat = 3
    
    parent1_prior = pm.Dirichlet("parent1_prior", a=[1]*parent_num_cat, shape=(parent_num_cat,))
    parent1 = pm.Categorical("parent1", p=parent1_prior)
    parent2_prior = pm.Dirichlet("parent2_prior", a=[1]*parent_num_cat, shape=(parent_num_cat,))
    parent2 = pm.Categorical("parent2", p=parent2_prior)

    parents = [parent1, parent2]

    child_num_cat = 3
    like_func = pm.Normal(f"likelihood_child", 0, 50, shape=(len(parents),child_num_cat))
    mu = tensor.tensordot(tensor.transpose(parents), like_func, axes=1)
    child = pm.Categorical(f"child", p=mu)
    
    trace = pm.sample(100, cores=1)

However when running this code, the child variable is always initialized to -inf and the sampling fails.

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'parent1_prior_simplex__': array([0., 0.]), 'parent1': array(0), 'parent2_prior_simplex__': array([0., 0.]), 'parent2': array(0), 'likelihood_child': array([[0., 0., 0.],
       [0., 0., 0.]]), 'child': array(0)}

Initial evaluation results:
{'parent1_prior': -1.5, 'parent1': -1.1, 'parent2_prior': -1.5, 'parent2': -1.1, 'likelihood_child': -28.99, 'child': -inf}

I tried fiddling with this in several different ways, and have observed that when the input to the child is a vector (i.e. [1,1,1]) the sampling works, and when it is a list of tensors it does not. I think it should be possible to use a matrix as a likelihood function between parents and a child, but I’m worried that I might be doing something fundamentally wrong here to cause this sampling failure.

You’re probably generating invalid probabilities for the categorical distribution. They must be nonegative and add up to 1 along the last (rightmost) dimension.

1 Like