Hi there,

In my case, I have a multinomial diagnosis variable taking values {“A”, “B”, “C”}. Each of them may have different scores represented by a multinomial score variable. For example, “A” has its score domain {1, 2} while “B” takes {1, 2, 3, 4}. Each record takes one diagnosis value and one corresponding score value (e.g. “A”, 2). They have unknown categorical probabilities (parameters) for these multinomials. All have to be learned from data.

In Bayesian network, the score variable (child) is conditioning on the diagnosis variable (parent). From textbook, these categorical probabilities can be counted and normalized through data with a Dirichlet prior at both parent and child levels. How this parent-child causal relationship can correctly be set up in PyMC3 (my second question indeed). In our case, the multinomial variables are inputs (X), along with the other continuous variables (conditionally independent from multinomials), used for classifying treatments and thereafter estimated treatment distributions.

Set up parent multinomial seems straightforward like

alphas_diagnosis = np.array([1., 1., 1.])

p_diagnosis = Dirichlet(“p_diagnosis”, a=alphas_diagnosis)

Xdiagnosis_ = Multinomial(n=data.shape[0], p=p_diagnosis, observed=data[:0])

Could anyone point to me the way set up the child multinomial conditioning on the parent?

A side note: the experiment also noticed that the ‘observed’ argument takes the array of the summarized counts of category occurrences instead of the array of each individual observed categorical value.

Best regards

Chris