# Bayes Nets, Belief Networks, and PyMC

So I am trying to get my head around how discrete Bayes Nets (sometimes called Belief Networks) relate to the kind of Bayesian Networks used all the time in PyMC3/STAN/etc. Here’s a concrete example:

This can be implemented in pomegranate (just one of the relevant Python packages) as:

import pomegranate as pg

smokeD = pg.DiscreteDistribution({'yes': 0.25, 'no': 0.75})
covidD = pg.DiscreteDistribution({'yes': 0.1, 'no': 0.9})
hospitalD = pg.ConditionalProbabilityTable(
[['yes', 'yes', 'yes', 0.9], ['yes', 'yes', 'no', 0.1],
['yes', 'no', 'yes', 0.1], ['yes', 'no', 'no', 0.9],
['no', 'yes', 'yes', 0.9], ['no', 'yes', 'no', 0.1],
['no', 'no', 'yes', 0.01], ['no', 'no', 'no', 0.99]],
[smokeD, covidD])

smoke = pg.Node(smokeD, name="smokeD")
covid = pg.Node(covidD, name="covidD")
hospital = pg.Node(hospitalD, name="hospitalD")

model = pg.BayesianNetwork("Covid Collider")
model.bake()


You could then calculate P(covid|smoking, hospital) = 0.5 with

model.predict_proba({'smokeD': 'yes', 'hospitalD': 'yes'})


and P(covid|¬smoking, hospital)=0.91 with

model.predict_proba({'smokeD': 'no', 'hospitalD': 'yes'})


This is interesting as it demonstrates collider bias - it looks as if smoking has a protective effect of getting COVID if you condition upon being in hospital. See Why most studies into COVID19 risk factors may be producing flawed conclusions for a really interesting preprint on this.

## Q1: Is there anything fundamentally different from these kinds of networks compared to the typical models used in PyMC3?

These discrete Bayes Nets are often discussed in relation to causal inference. Is there anything fundamentally different about networks of discrete variables that allows causal claims to be made which can’t be done in continuous or hybrid discrete + continuous networks? I assume the answer is no, but would love to know if that’s not the case, or if it’s still debated.

These Bayes Net / Belief Network packages allow you to specify the joint distribution then make multiple conditional distribution queries. But PyMC models are always conditional probability distributions (?), conditioned on the data. And so if you want to make multiple queries, then you need to compose multiple models?

These Bayes Net / Belief Network packages do have structure learning in them - as in you provide data and the code spits out a most likely network. As far as I understand that is not a feature that PyMC aspires to?

## Q2: Can PyMC evaluate these kinds of models?

It seems that there is nothing fundamental getting in the way of PyMC being able to deal with these Bayes Nets, at least in terms of calculating conditional probabilities? Although it might not be as direct as the pomegranate example above because:

• You have to have specify the appropriate discrete node type rather than just specify a list of probabilities. But that doesn’t seem like a big deal.
• PyMC models don’t have a Conditional Probability Table concept (or do they?). For binary variables I guess you can use the ‘indicator variable’ approach I guess. Not sure if you can just have a function with some if and elif action to help deal with nodes with multiple discrete category levels?

I’d be very grateful to anyone who can confirm I’m on the right track or to correct where I might be going wrong in my understanding. It could be very useful in general for a bit of a discussion to highlight the boundaries of PyMC and where it’s best to hand over to other related packages.

1 Like

I dont think there is anything fundamentally different, after all they are mathematical model that you apply the chain rule and product rule of probability, and the differences of discrete and continuous is mostly how you do marginalization. You can express Bayes Net with discrete distributions (i.e., Categorical distribution), and PyMC3 Model will then construct a joint distribution conditioned on the observed (but since you can also construct a model with no observed, you dont need to label it as conditional probability distribution).

Certainly, my approach will goes like this:

import pymc3 as pm
import numpy as np

import theano
import theano.tensor as tt

lookup_table = theano.shared(np.asarray([
[[.99, .01], [.1, .9]],
[[.9, .1], [.1, .9]]]))

def f(smoker, covid):
return lookup_table[smoker, covid]

with pm.Model() as m:
smoker = pm.Categorical('smoker', [.75, .25])
covid = pm.Categorical('covid', [.9, .1])
hospital = pm.Categorical('hospital', f(smoker, covid))
prior_trace = pm.sample_prior_predictive(100000)

predict_proba0 = prior_trace['covid'][
(prior_trace['smoker'] == 0)
& (prior_trace['hospital'] == 1)].mean()
predict_proba1 = prior_trace['covid'][
(prior_trace['smoker'] == 1)
& (prior_trace['hospital'] == 1)].mean()

print(f'P(covid|¬smoking, hospital) is {predict_proba0}')
print(f'P(covid|smoking, hospital) is {predict_proba1}')


which you can get an approximation of:

P(covid|¬smoking, hospital) is 0.9103626239304631
P(covid|smoking, hospital) is 0.4927695004382121

7 Likes

That’s really neat. I don’t know about all these theano tricks. And I didn’t even think of sampling from the prior and conditioning on variables in that way.

1 Like

Note that you will need quite a lot of samples to get accurate estimation, especially if you have small value in the look up table. But I think this approach is nice as it is quite general (you can do the same for continuous distribution, and do prior_trace['some_variable'] > a prior_trace['some_variable'] < b to compute the conditional within an interval).

Otherwise, you can implement a loopy belief propagation using the logic in draw_values to sort out the graph structure.

3 Likes

I understood the first part of that.

Would you also be able to use a similar approach if you had uncertain observations. For example if you were 100% sure they were in the hospital, but 80% sure they were a smoker.

Here’s a neat example from chapter 7 of Risk Assessment and Decision Analysis with Bayesian Networks, by Fenton & Neil to check the numbers are coming out as expected.

import pymc3 as pm
import numpy as np

import theano
import theano.tensor as tt

names = ['martin_oversleeps', 'train_strike', 'martin_late', 'norman_late']

# P(martin_late|train_strike, martin_oversleeps) -------------------------

martin_late_lookup = theano.shared(np.asarray([
[[.7, .3], [.4, .6]],
[[.4, .6], [.2, .8]]]))

def p_martin_late(train_strike, martin_oversleeps):
return martin_late_lookup[train_strike, martin_oversleeps]

# P(norman_late|train_strike) --------------------------------------------
p_norman_late_lookup = theano.shared(np.asarray([[0.9, .1], [.2, .8]]))

def p_norman_late(train_strike):
return p_norman_late_lookup[train_strike]

# Build model ------------------------------------------------------------
with pm.Model() as m:
martin_oversleeps = pm.Categorical('martin_oversleeps', [.6, .4])
train_strike = pm.Categorical('train_strike', [.9, .1])
norman_late = pm.Categorical('norman_late',
p_norman_late(train_strike))
martin_late = pm.Categorical('martin_late',
p_martin_late(train_strike, martin_oversleeps))
prior = pm.sample_prior_predictive(500_000)


def marginal_distributions(trace, names):
"""Samples from the prior.
trace = samples from prior
names = list of strings of node names
"""
for name in names:
print(f"P({name}) = {trace[name].mean()*100:.2f}%")

def conditional_probabilities(trace, names, evidence):
"""Output the probabilities of all the variables, conditioned on the evidence.
trace = samples from prior
names = list of strings of node names
evidence = {'norman_late': 1, 'martin_late': 1} (for example)
"""

n_samples = len(list(trace.values())[0])

for name in names:

# Calculate Boolean subset of samples confiorming to the evidence
subset = np.full(n_samples, True, dtype=bool)
for key, val in evidence.items():
subset = (subset) & (trace[key] == val)

# Calculate mean of variable of interest consistent with the evidence
prob = trace[name][subset == 1].mean()

# Compose string
string = f"P({name} | "
for key, value in evidence.items():
string += key + f" = {value}"
string += ", "
string += f") = {prob*100:.2f}%"

print(string)


So now you can do things like

marginal_distributions(prior, names)


and get

P(martin_oversleeps) = 40.00%
P(train_strike) = 10.07%
P(martin_late) = 44.58%
P(norman_late) = 17.01%


or condition upon evidence provided in the form of a dictionary

conditional_probabilities(prior, names, {'norman_late': 1, 'martin_late': 1})


and get

P(martin_oversleeps | norman_late = 1, martin_late = 1, ) = 51.51%
P(train_strike | norman_late = 1, martin_late = 1, ) = 59.41%
P(martin_late | norman_late = 1, martin_late = 1, ) = 100.00%
P(norman_late | norman_late = 1, martin_late = 1, ) = 100.00%


Pretty cool. Although I’ve not checked it’s robust, eg for nodes with >2 category levels

5 Likes

Just to say it was a super interesting thread, with a nice illustration of collider bias in bonus Thanks guys

Also, out of curiosity @drbenvincent, did you already use pomegranate for models in production? What do you think are its main strenghts and weaknesses? And why do you want to implement this given model in PyMC instead of sticking to pomegranate?

2 Likes

Glad it’s of interest to at least one other person! So my introduction to practical Bayesian problems was JAGS, then PyMC3 and a bit of STAN. That gave me the impression that I had a reasonable overview of DAGs in the probabilistic programming space - but coming across these discrete Bayes Nets threw me a bit.

Some of my academic work is taking me down the road of reading about causal inference and errors in reasoning (like collider bias). All that reading (so far) seems to focus on discrete Bayes Nets and a totally different set of software tools. There seems to be BayesiaLab and AgenaRisk which seem very full featured, but are expensive and GUI-based. If I ever moved from academia into data science/decision support, then using these would probably be pretty good to be honest. Free and code-based packages include pomegranate, pgmpy, and bnlearn, and probably more. There’s also bnlearn for R (different developer) which might be the most developed. I’ve got the book on order. And there’s BayesNets.jl for Julia which looks cool, but the docs are a bit minimal.

I was basically trying to work out how much overlap there was, how much I could generalise what I know from JAGS/PyMC3 to these Bayes Nets. It seems, with help from @junpenglao, that there is no extra set of concepts as such, just different implementation details, with a focus on exact solutions. So it looks like the intuitions about inference can generalise over.

For this particular thread of work, I probably will move forward with pomegranate, or bnlearn, to be honest. Being built for the task, they have some nice features, e.g. it’s nice that you can explicitly name the levels of your categorical factors \mathrm{COVID~severity} = \{ \mathrm{mild, moderate, severe} \} and you can easily write down conditional probability tables.

It seems like there is way more development effort/people who focus on PyMC3, STAN, type frameworks than the Bayes Net packages. They seem more polished, with more complete docs. But hopefully I’ll get up to speed with the Bayes Net packages soon.

2 Likes

Thanks for the detailed answer! It makes sense.
I’m pretty new to the Bayes Net side too, but it’s nice to see that the difference btw the Bayes Nets packages and PyMC/Stan is not one of concepts but one of implementation and focus, as you said. Should make going back btw the two quite easy

BTW, I think your work and all these topics are super interesting and would make for a very good podcast episode – if you feel like it, I’d be happy to host you!

1 Like

That could be pretty cool Let me listen to a few episodes to gauge whether my imposter syndrome is warranted or not

1 Like

Ha ha, TBH the biggest imposter is always me in comparison to my guests, so it’ll be ok Let’s continue talking in private.

1 Like