Bayes Nets, Belief Networks, and PyMC

Note that you will need quite a lot of samples to get accurate estimation, especially if you have small value in the look up table. But I think this approach is nice as it is quite general (you can do the same for continuous distribution, and do prior_trace['some_variable'] > a prior_trace['some_variable'] < b to compute the conditional within an interval).

Otherwise, you can implement a loopy belief propagation using the logic in draw_values to sort out the graph structure.


I understood the first part of that.

Would you also be able to use a similar approach if you had uncertain observations. For example if you were 100% sure they were in the hospital, but 80% sure they were a smoker.

Here’s a neat example from chapter 7 of Risk Assessment and Decision Analysis with Bayesian Networks, by Fenton & Neil to check the numbers are coming out as expected.

import pymc3 as pm
import numpy as np

import theano
import theano.tensor as tt

names = ['martin_oversleeps', 'train_strike', 'martin_late', 'norman_late']

# P(martin_late|train_strike, martin_oversleeps) -------------------------

martin_late_lookup = theano.shared(np.asarray([
    [[.7, .3], [.4, .6]],
    [[.4, .6], [.2, .8]]]))

def p_martin_late(train_strike, martin_oversleeps):
    return martin_late_lookup[train_strike, martin_oversleeps]

# P(norman_late|train_strike) --------------------------------------------
p_norman_late_lookup = theano.shared(np.asarray([[0.9, .1], [.2, .8]]))

def p_norman_late(train_strike):
    return p_norman_late_lookup[train_strike]

# Build model ------------------------------------------------------------
with pm.Model() as m:
    martin_oversleeps = pm.Categorical('martin_oversleeps', [.6, .4])
    train_strike = pm.Categorical('train_strike', [.9, .1])
    norman_late = pm.Categorical('norman_late', 
    martin_late = pm.Categorical('martin_late', 
                                 p_martin_late(train_strike, martin_oversleeps))
    prior = pm.sample_prior_predictive(500_000)

Then added some helper functions

def marginal_distributions(trace, names):
    """Samples from the prior.
    trace = samples from prior
    names = list of strings of node names
    for name in names:
        print(f"P({name}) = {trace[name].mean()*100:.2f}%")

def conditional_probabilities(trace, names, evidence):
    """Output the probabilities of all the variables, conditioned on the evidence.
    trace = samples from prior
    names = list of strings of node names
    evidence = {'norman_late': 1, 'martin_late': 1} (for example)
    n_samples = len(list(trace.values())[0])
    for name in names:

        # Calculate Boolean subset of samples confiorming to the evidence
        subset = np.full(n_samples, True, dtype=bool)
        for key, val in evidence.items():
            subset = (subset) & (trace[key] == val)

        # Calculate mean of variable of interest consistent with the evidence
        prob = trace[name][subset == 1].mean()

        # Compose string
        string = f"P({name} | "
        for key, value in evidence.items():
            string += key + f" = {value}"
            string += ", "
        string += f") = {prob*100:.2f}%"

So now you can do things like

marginal_distributions(prior, names)

and get

P(martin_oversleeps) = 40.00%
P(train_strike) = 10.07%
P(martin_late) = 44.58%
P(norman_late) = 17.01%

or condition upon evidence provided in the form of a dictionary

conditional_probabilities(prior, names, {'norman_late': 1, 'martin_late': 1})

and get

P(martin_oversleeps | norman_late = 1, martin_late = 1, ) = 51.51%
P(train_strike | norman_late = 1, martin_late = 1, ) = 59.41%
P(martin_late | norman_late = 1, martin_late = 1, ) = 100.00%
P(norman_late | norman_late = 1, martin_late = 1, ) = 100.00%

Pretty cool. Although I’ve not checked it’s robust, eg for nodes with >2 category levels


Just to say it was a super interesting thread, with a nice illustration of collider bias in bonus :wink: Thanks guys :slightly_smiling_face:

Also, out of curiosity @drbenvincent, did you already use pomegranate for models in production? What do you think are its main strenghts and weaknesses? And why do you want to implement this given model in PyMC instead of sticking to pomegranate?


Glad it’s of interest to at least one other person! So my introduction to practical Bayesian problems was JAGS, then PyMC3 and a bit of STAN. That gave me the impression that I had a reasonable overview of DAGs in the probabilistic programming space - but coming across these discrete Bayes Nets threw me a bit.

Some of my academic work is taking me down the road of reading about causal inference and errors in reasoning (like collider bias). All that reading (so far) seems to focus on discrete Bayes Nets and a totally different set of software tools. There seems to be BayesiaLab and AgenaRisk which seem very full featured, but are expensive and GUI-based. If I ever moved from academia into data science/decision support, then using these would probably be pretty good to be honest. Free and code-based packages include pomegranate, pgmpy, and bnlearn, and probably more. There’s also bnlearn for R (different developer) which might be the most developed. I’ve got the book on order. And there’s BayesNets.jl for Julia which looks cool, but the docs are a bit minimal.

I was basically trying to work out how much overlap there was, how much I could generalise what I know from JAGS/PyMC3 to these Bayes Nets. It seems, with help from @junpenglao, that there is no extra set of concepts as such, just different implementation details, with a focus on exact solutions. So it looks like the intuitions about inference can generalise over.

For this particular thread of work, I probably will move forward with pomegranate, or bnlearn, to be honest. Being built for the task, they have some nice features, e.g. it’s nice that you can explicitly name the levels of your categorical factors \mathrm{COVID~severity} = \{ \mathrm{mild, moderate, severe} \} and you can easily write down conditional probability tables.

It seems like there is way more development effort/people who focus on PyMC3, STAN, type frameworks than the Bayes Net packages. They seem more polished, with more complete docs. But hopefully I’ll get up to speed with the Bayes Net packages soon.


Thanks for the detailed answer! It makes sense.
I’m pretty new to the Bayes Net side too, but it’s nice to see that the difference btw the Bayes Nets packages and PyMC/Stan is not one of concepts but one of implementation and focus, as you said. Should make going back btw the two quite easy :ok_hand:

BTW, I think your work and all these topics are super interesting and would make for a very good podcast episode – if you feel like it, I’d be happy to host you!


That could be pretty cool :slight_smile: Let me listen to a few episodes to gauge whether my imposter syndrome is warranted or not :+1:

1 Like

Ha ha, TBH the biggest imposter is always me in comparison to my guests, so it’ll be ok :wink: Let’s continue talking in private.

1 Like

this is a very insightful conversation you are havin here, thanks :slight_smile:

I have come across the thread because I, too, am trying to combine the powers of pymc with causal inference (in the DAG/Bayes Nets framework). [For example, I found causalnex for CI with DAGs, though not using PyMC.]

Generally, is anyone of you aware of such libraries or current efforts?


This is really interesting thread. I was going to post a question similar to the one asked here. But the PyMC discourse summary that I receive in email brought my attention to this thread. I’m working on an implementation of Bayes Net for modeling the risk of accidents (predict likelihood of an accident). The network is certainly bigger and a little complex than the three node network here. I have both continuous and categorical variables in my network. And I have ventured into the field of Bayesian just recently and still somewhere half way on my learning curve.

I did a very limited reading about various open source tools available during which I came across PyMC. Without doing a deep dive into the capabilities of PyMC i directly started running MCMC simulations using the accident data that I had to come up with conditional probability distributions, for every possible parent-child subset of the network. That allowed me to understand the dependencies be it a causal or statistical correlation for every subset of the network.

But then I started to realize that PyMC does not have the capability to propagate the probabilities to the leaf node (or leaves). Just for demonstration, I used the conditional probability distributions for each subnetwork to develop a CPT and then manually implemented the variable elimination method for belief updating.

After reading through the replies in this thread, I want to give pomegranate a try. My question to @drbenvincent is - can pomegranate run MCMC simulations to arrive at the CPT or the user needs to actually specify the CPT like your implementation of the COIVD19 collider-bias problem?

It’s just been about a year and a half since my venturing into the Bayesian world and I certainly have more questions to ask than provide answers. I would greatly appreciate if I can post more questions related to this topic here, or in a separate thread.

And a shout out to @AlexAndorra and @junpenglao for having helped me with my previous questions on this forum.

I’m willing to start a private conversation with someone having domain experience as the current problem I’m working can lead to collaborative prospects with consulting opportunities.


Regarding Bayes Nets, conditioning, and causal inference, do you have any experience with pyro [even though its not “us” ;)]? It says that you can do conditioning explicitly and also has a do-operator implementation:

To be honest I’m not sure. Presumably this would be doable in PyMC3 having priors over parameters in the CPT?

EDIT: I just checked my copy of Bayesian Networks: with example in R and section 1.4 talks about using the bnlearn package to estimate the CPT values with MLE

I’m very far from being an expert in these Bayes Nets by the way - I learnt just enough to get through a paper I was working on. If anyone is interested, it’s currently at the preprint stage on bioRxiv: Cycling and prostate cancer risk – Bayesian insights from observational study data


Would it be possible to include the conditions using likelihoods? (and sampling normally)
For example, for the first query something like


and for the second query


If so, what would be the right way to specify a likelihood that gives probability one for the observed value and 0 otherwise?

I’ve been working on a Bayes net-related problem with PyMC3 as well. Essentially, the basic idea is that I have a Bayes net (albeit actually an undirected version) with several categorical variables and a known graphical structure. However, when it comes time to generate predictions on new data, sometimes we want to make use of spatial proximity of other observations so I adjust the CPT entries using a spatial Markov random field. I wrote a custom step method for propagating Gibbs sampling within the BN. The work isn’t quite ready for prime time yet but I’m happy to talk about it.


Good questions.
I actually came to Bayesian inference after studying Bayesian Networks. About 5 years after.

To my mind Bayesian Networks are the most straight forward application of Bayes rule; in that they deal with discrete probabilities, and they are networks of contingency tables. And the mathematics to build them and use them for deduction or inference needs only application of Bayes theorem, the chain and product rule. Mathematically and computationally it is not that hard, relatively.

Bayesian Networks generally do not deal with continuous probability distributions though. (Some software packages have the abilitiy to discretize continuous distributions into a limited number of discrete buckets). They are also just one class of model.

The implementation of markov chain monte carlo algorithms in PyMC and Stan etc. are an attempt to extend the use of Bayes rule a) beyond simple contingency table-like applications of Bayes, b) to extend the use of Bayes into inference problems with continuous probability distributions and c) to use Bayesian inference determine the parameters of any mathematical model.

Or to put it another way … a Bayesian Network is a direct manifestation of Bayes Theorem.
PyMC uses Bayes Theorem as a tool to determine the parameters of other models … i.e. its a stepping stone to parameterise another model.

PyMC is not the ultimate target model itself; its an intermediate Bayesian approach to solve different problems. Whereas a Bayesian Network is the target model.

Personally, I can’t see the point in using PyMC to infer a conventional (discrete) Bayesian Network … but you may have applications that I haven’t imagined.

If you’ve not seen it already, this Windows package is the sleekest, most cost-effective tool I’ve yet come across for building (either by hand or inferring from data), and then using Bayesian Networks. Its an easier place to start than say Pomegranate in Python if you want to study and learn the concepts.


Any PyMC dev interest in building a proper bayesian/belief network class/module? I’ve got some free time… :man_shrugging:


What’s your preferred python-ecosystem library for bayesian/belief network modeling?

I just came across this thread. In 2018 I wrote a python interface to R bnlearn: I updated this to rpy2 version 3 last year. In 2018 R bnlearn was the most comprehensive free solution for working with bayesian networks. Have a look at some of the examples: There is also an example that demonstrates parameter learning.

The book Bayesian Networks in R also helps.


hi everyone,

this is a great thread! I am facing a similar question currently:

Currently working in Academia and fluent in both R and Python. I worked a lot with Bayesian (Hierarchical Generalized) Linear Models but due to interest in causal modeling encountered more and more literature about Bayesian Networks. I am looking for a great package to implement Bayesian Networks for both my academic work but also such that I might be able to step over to Industry after my current contract.

Any new developments on which package might be the best to built this competence @drbenvincent ?

Pyro looks really very interesting but I have not seen people talk about it much here. If people can give recommendations on which packages work best that would be great! With best I mean that a package is as such that it is versatile by allowing different use cases of Bayesian Networks as well as allowing different most common operations that are common in industry and academia.