Bayes Nets, Belief Networks, and PyMC

So I am trying to get my head around how discrete Bayes Nets (sometimes called Belief Networks) relate to the kind of Bayesian Networks used all the time in PyMC3/STAN/etc. Here’s a concrete example:


This can be implemented in pomegranate (just one of the relevant Python packages) as:

import pomegranate as pg

smokeD = pg.DiscreteDistribution({'yes': 0.25, 'no': 0.75})
covidD = pg.DiscreteDistribution({'yes': 0.1, 'no': 0.9})
hospitalD = pg.ConditionalProbabilityTable(
    [['yes', 'yes', 'yes', 0.9], ['yes', 'yes', 'no', 0.1],
     ['yes', 'no', 'yes', 0.1], ['yes', 'no', 'no', 0.9],
     ['no', 'yes', 'yes', 0.9], ['no', 'yes', 'no', 0.1],
     ['no', 'no', 'yes', 0.01], ['no', 'no', 'no', 0.99]],
    [smokeD, covidD])

smoke = pg.Node(smokeD, name="smokeD")
covid = pg.Node(covidD, name="covidD")
hospital = pg.Node(hospitalD, name="hospitalD")

model = pg.BayesianNetwork("Covid Collider")
model.add_states(smoke, covid, hospital)
model.add_edge(smoke, hospital)
model.add_edge(covid, hospital)
model.bake()

You could then calculate P(covid|smoking, hospital) = 0.5 with

model.predict_proba({'smokeD': 'yes', 'hospitalD': 'yes'})

and P(covid|¬smoking, hospital)=0.91 with

model.predict_proba({'smokeD': 'no', 'hospitalD': 'yes'})

This is interesting as it demonstrates collider bias - it looks as if smoking has a protective effect of getting COVID if you condition upon being in hospital. See Why most studies into COVID19 risk factors may be producing flawed conclusions for a really interesting preprint on this.

Q1: Is there anything fundamentally different from these kinds of networks compared to the typical models used in PyMC3?

These discrete Bayes Nets are often discussed in relation to causal inference. Is there anything fundamentally different about networks of discrete variables that allows causal claims to be made which can’t be done in continuous or hybrid discrete + continuous networks? I assume the answer is no, but would love to know if that’s not the case, or if it’s still debated.

These Bayes Net / Belief Network packages allow you to specify the joint distribution then make multiple conditional distribution queries. But PyMC models are always conditional probability distributions (?), conditioned on the data. And so if you want to make multiple queries, then you need to compose multiple models?

These Bayes Net / Belief Network packages do have structure learning in them - as in you provide data and the code spits out a most likely network. As far as I understand that is not a feature that PyMC aspires to?

Q2: Can PyMC evaluate these kinds of models?

It seems that there is nothing fundamental getting in the way of PyMC being able to deal with these Bayes Nets, at least in terms of calculating conditional probabilities? Although it might not be as direct as the pomegranate example above because:

  • You have to have specify the appropriate discrete node type rather than just specify a list of probabilities. But that doesn’t seem like a big deal.
  • PyMC models don’t have a Conditional Probability Table concept (or do they?). For binary variables I guess you can use the ‘indicator variable’ approach I guess. Not sure if you can just have a function with some if and elif action to help deal with nodes with multiple discrete category levels?

I’d be very grateful to anyone who can confirm I’m on the right track or to correct where I might be going wrong in my understanding. It could be very useful in general for a bit of a discussion to highlight the boundaries of PyMC and where it’s best to hand over to other related packages.

6 Likes

I dont think there is anything fundamentally different, after all they are mathematical model that you apply the chain rule and product rule of probability, and the differences of discrete and continuous is mostly how you do marginalization. You can express Bayes Net with discrete distributions (i.e., Categorical distribution), and PyMC3 Model will then construct a joint distribution conditioned on the observed (but since you can also construct a model with no observed, you dont need to label it as conditional probability distribution).

Certainly, my approach will goes like this:

import pymc3 as pm
import numpy as np

import theano
import theano.tensor as tt

lookup_table = theano.shared(np.asarray([
    [[.99, .01], [.1, .9]],
    [[.9, .1], [.1, .9]]]))

def f(smoker, covid):
    return lookup_table[smoker, covid]

with pm.Model() as m:
    smoker = pm.Categorical('smoker', [.75, .25])
    covid = pm.Categorical('covid', [.9, .1])
    hospital = pm.Categorical('hospital', f(smoker, covid))
    prior_trace = pm.sample_prior_predictive(100000)

predict_proba0 = prior_trace['covid'][
    (prior_trace['smoker'] == 0)
  & (prior_trace['hospital'] == 1)].mean()
predict_proba1 = prior_trace['covid'][
    (prior_trace['smoker'] == 1)
  & (prior_trace['hospital'] == 1)].mean()

print(f'P(covid|¬smoking, hospital) is {predict_proba0}')
print(f'P(covid|smoking, hospital) is {predict_proba1}')

which you can get an approximation of:

P(covid|¬smoking, hospital) is 0.9103626239304631
P(covid|smoking, hospital) is 0.4927695004382121
12 Likes

That’s really neat. I don’t know about all these theano tricks. And I didn’t even think of sampling from the prior and conditioning on variables in that way. :+1:

1 Like

Note that you will need quite a lot of samples to get accurate estimation, especially if you have small value in the look up table. But I think this approach is nice as it is quite general (you can do the same for continuous distribution, and do prior_trace['some_variable'] > a prior_trace['some_variable'] < b to compute the conditional within an interval).

Otherwise, you can implement a loopy belief propagation using the logic in draw_values to sort out the graph structure.

4 Likes

I understood the first part of that.

Would you also be able to use a similar approach if you had uncertain observations. For example if you were 100% sure they were in the hospital, but 80% sure they were a smoker.

Here’s a neat example from chapter 7 of Risk Assessment and Decision Analysis with Bayesian Networks, by Fenton & Neil to check the numbers are coming out as expected.

import pymc3 as pm
import numpy as np

import theano
import theano.tensor as tt

names = ['martin_oversleeps', 'train_strike', 'martin_late', 'norman_late']

# P(martin_late|train_strike, martin_oversleeps) -------------------------

martin_late_lookup = theano.shared(np.asarray([
    [[.7, .3], [.4, .6]],
    [[.4, .6], [.2, .8]]]))

def p_martin_late(train_strike, martin_oversleeps):
    return martin_late_lookup[train_strike, martin_oversleeps]

# P(norman_late|train_strike) --------------------------------------------
p_norman_late_lookup = theano.shared(np.asarray([[0.9, .1], [.2, .8]]))

def p_norman_late(train_strike):
    return p_norman_late_lookup[train_strike]

# Build model ------------------------------------------------------------
with pm.Model() as m:
    martin_oversleeps = pm.Categorical('martin_oversleeps', [.6, .4])
    train_strike = pm.Categorical('train_strike', [.9, .1])
    norman_late = pm.Categorical('norman_late', 
                                 p_norman_late(train_strike))
    martin_late = pm.Categorical('martin_late', 
                                 p_martin_late(train_strike, martin_oversleeps))
    prior = pm.sample_prior_predictive(500_000)

Then added some helper functions

def marginal_distributions(trace, names):
    """Samples from the prior.
    trace = samples from prior
    names = list of strings of node names
    """
    for name in names:
        print(f"P({name}) = {trace[name].mean()*100:.2f}%")

def conditional_probabilities(trace, names, evidence):
    """Output the probabilities of all the variables, conditioned on the evidence.
    trace = samples from prior
    names = list of strings of node names
    evidence = {'norman_late': 1, 'martin_late': 1} (for example)
    """
    
    n_samples = len(list(trace.values())[0])
    
    for name in names:

        # Calculate Boolean subset of samples confiorming to the evidence
        subset = np.full(n_samples, True, dtype=bool)
        for key, val in evidence.items():
            subset = (subset) & (trace[key] == val)

        # Calculate mean of variable of interest consistent with the evidence
        prob = trace[name][subset == 1].mean()

        # Compose string
        string = f"P({name} | "
        for key, value in evidence.items():
            string += key + f" = {value}"
            string += ", "
        string += f") = {prob*100:.2f}%"
        
        print(string)

So now you can do things like

marginal_distributions(prior, names)

and get

P(martin_oversleeps) = 40.00%
P(train_strike) = 10.07%
P(martin_late) = 44.58%
P(norman_late) = 17.01%

or condition upon evidence provided in the form of a dictionary

conditional_probabilities(prior, names, {'norman_late': 1, 'martin_late': 1})

and get

P(martin_oversleeps | norman_late = 1, martin_late = 1, ) = 51.51%
P(train_strike | norman_late = 1, martin_late = 1, ) = 59.41%
P(martin_late | norman_late = 1, martin_late = 1, ) = 100.00%
P(norman_late | norman_late = 1, martin_late = 1, ) = 100.00%

Pretty cool. Although I’ve not checked it’s robust, eg for nodes with >2 category levels

7 Likes

Just to say it was a super interesting thread, with a nice illustration of collider bias in bonus :wink: Thanks guys :slightly_smiling_face:

Also, out of curiosity @drbenvincent, did you already use pomegranate for models in production? What do you think are its main strenghts and weaknesses? And why do you want to implement this given model in PyMC instead of sticking to pomegranate?

3 Likes

Glad it’s of interest to at least one other person! So my introduction to practical Bayesian problems was JAGS, then PyMC3 and a bit of STAN. That gave me the impression that I had a reasonable overview of DAGs in the probabilistic programming space - but coming across these discrete Bayes Nets threw me a bit.

Some of my academic work is taking me down the road of reading about causal inference and errors in reasoning (like collider bias). All that reading (so far) seems to focus on discrete Bayes Nets and a totally different set of software tools. There seems to be BayesiaLab and AgenaRisk which seem very full featured, but are expensive and GUI-based. If I ever moved from academia into data science/decision support, then using these would probably be pretty good to be honest. Free and code-based packages include pomegranate, pgmpy, and bnlearn, and probably more. There’s also bnlearn for R (different developer) which might be the most developed. I’ve got the book on order. And there’s BayesNets.jl for Julia which looks cool, but the docs are a bit minimal.

I was basically trying to work out how much overlap there was, how much I could generalise what I know from JAGS/PyMC3 to these Bayes Nets. It seems, with help from @junpenglao, that there is no extra set of concepts as such, just different implementation details, with a focus on exact solutions. So it looks like the intuitions about inference can generalise over.

For this particular thread of work, I probably will move forward with pomegranate, or bnlearn, to be honest. Being built for the task, they have some nice features, e.g. it’s nice that you can explicitly name the levels of your categorical factors \mathrm{COVID~severity} = \{ \mathrm{mild, moderate, severe} \} and you can easily write down conditional probability tables.

It seems like there is way more development effort/people who focus on PyMC3, STAN, type frameworks than the Bayes Net packages. They seem more polished, with more complete docs. But hopefully I’ll get up to speed with the Bayes Net packages soon.

2 Likes

Thanks for the detailed answer! It makes sense.
I’m pretty new to the Bayes Net side too, but it’s nice to see that the difference btw the Bayes Nets packages and PyMC/Stan is not one of concepts but one of implementation and focus, as you said. Should make going back btw the two quite easy :ok_hand:

BTW, I think your work and all these topics are super interesting and would make for a very good podcast episode – if you feel like it, I’d be happy to host you!

2 Likes

That could be pretty cool :slight_smile: Let me listen to a few episodes to gauge whether my imposter syndrome is warranted or not :+1:

1 Like

Ha ha, TBH the biggest imposter is always me in comparison to my guests, so it’ll be ok :wink: Let’s continue talking in private.

1 Like

Hi,
this is a very insightful conversation you are havin here, thanks :slight_smile:

I have come across the thread because I, too, am trying to combine the powers of pymc with causal inference (in the DAG/Bayes Nets framework). [For example, I found causalnex for CI with DAGs, though not using PyMC.]

Generally, is anyone of you aware of such libraries or current efforts?

2 Likes

This is really interesting thread. I was going to post a question similar to the one asked here. But the PyMC discourse summary that I receive in email brought my attention to this thread. I’m working on an implementation of Bayes Net for modeling the risk of accidents (predict likelihood of an accident). The network is certainly bigger and a little complex than the three node network here. I have both continuous and categorical variables in my network. And I have ventured into the field of Bayesian just recently and still somewhere half way on my learning curve.

I did a very limited reading about various open source tools available during which I came across PyMC. Without doing a deep dive into the capabilities of PyMC i directly started running MCMC simulations using the accident data that I had to come up with conditional probability distributions, for every possible parent-child subset of the network. That allowed me to understand the dependencies be it a causal or statistical correlation for every subset of the network.

But then I started to realize that PyMC does not have the capability to propagate the probabilities to the leaf node (or leaves). Just for demonstration, I used the conditional probability distributions for each subnetwork to develop a CPT and then manually implemented the variable elimination method for belief updating.

After reading through the replies in this thread, I want to give pomegranate a try. My question to @drbenvincent is - can pomegranate run MCMC simulations to arrive at the CPT or the user needs to actually specify the CPT like your implementation of the COIVD19 collider-bias problem?

It’s just been about a year and a half since my venturing into the Bayesian world and I certainly have more questions to ask than provide answers. I would greatly appreciate if I can post more questions related to this topic here, or in a separate thread.

And a shout out to @AlexAndorra and @junpenglao for having helped me with my previous questions on this forum.

I’m willing to start a private conversation with someone having domain experience as the current problem I’m working can lead to collaborative prospects with consulting opportunities.

2 Likes

Regarding Bayes Nets, conditioning, and causal inference, do you have any experience with pyro [even though its not “us” ;)]? It says that you can do conditioning explicitly and also has a do-operator implementation: http://pyro.ai/examples/intro_part_ii.html#Conditioning

To be honest I’m not sure. Presumably this would be doable in PyMC3 having priors over parameters in the CPT?

EDIT: I just checked my copy of Bayesian Networks: with example in R and section 1.4 talks about using the bnlearn package to estimate the CPT values with MLE

I’m very far from being an expert in these Bayes Nets by the way - I learnt just enough to get through a paper I was working on. If anyone is interested, it’s currently at the preprint stage on bioRxiv: Cycling and prostate cancer risk – Bayesian insights from observational study data

2 Likes

Would it be possible to include the conditions using likelihoods? (and sampling normally)
For example, for the first query something like

pm.Normal("obs1",mu=smoker,sigma=0.0001,observed=1)
pm.Normal("obs2",mu=hospital,sigma=0.0001,observed=0)

and for the second query

pm.Normal("obs1",mu=smoker,sigma=0.0001,observed=1)
pm.Normal("obs2",mu=hospital,sigma=0.0001,observed=1)

If so, what would be the right way to specify a likelihood that gives probability one for the observed value and 0 otherwise?

I’ve been working on a Bayes net-related problem with PyMC3 as well. Essentially, the basic idea is that I have a Bayes net (albeit actually an undirected version) with several categorical variables and a known graphical structure. However, when it comes time to generate predictions on new data, sometimes we want to make use of spatial proximity of other observations so I adjust the CPT entries using a spatial Markov random field. I wrote a custom step method for propagating Gibbs sampling within the BN. The work isn’t quite ready for prime time yet but I’m happy to talk about it.

3 Likes

Hi
Good questions.
I actually came to Bayesian inference after studying Bayesian Networks. About 5 years after.

To my mind Bayesian Networks are the most straight forward application of Bayes rule; in that they deal with discrete probabilities, and they are networks of contingency tables. And the mathematics to build them and use them for deduction or inference needs only application of Bayes theorem, the chain and product rule. Mathematically and computationally it is not that hard, relatively.

Bayesian Networks generally do not deal with continuous probability distributions though. (Some software packages have the abilitiy to discretize continuous distributions into a limited number of discrete buckets). They are also just one class of model.

The implementation of markov chain monte carlo algorithms in PyMC and Stan etc. are an attempt to extend the use of Bayes rule a) beyond simple contingency table-like applications of Bayes, b) to extend the use of Bayes into inference problems with continuous probability distributions and c) to use Bayesian inference determine the parameters of any mathematical model.

Or to put it another way … a Bayesian Network is a direct manifestation of Bayes Theorem.
PyMC uses Bayes Theorem as a tool to determine the parameters of other models … i.e. its a stepping stone to parameterise another model.

PyMC is not the ultimate target model itself; its an intermediate Bayesian approach to solve different problems. Whereas a Bayesian Network is the target model.

Personally, I can’t see the point in using PyMC to infer a conventional (discrete) Bayesian Network … but you may have applications that I haven’t imagined.

If you’ve not seen it already, this Windows package is the sleekest, most cost-effective tool I’ve yet come across for building (either by hand or inferring from data), and then using Bayesian Networks. https://www.bayesserver.com/ Its an easier place to start than say Pomegranate in Python if you want to study and learn the concepts.

5 Likes

Any PyMC dev interest in building a proper bayesian/belief network class/module? I’ve got some free time… :man_shrugging:

2 Likes