PyMC3 Conditioning Random Variable on Multiple Discrete Parents


I recently started doing probabilistic programming using PyMC3. In my scenario, I have 3 random variables: On, Triangle, and X, such that X depends on Triangle and On. Triangle and On both follow Bernoulli distributions, and depending on which value they take, the value of X, which follows a Normal, changes.

I wrote up some mock code to test this concept but it doesn’t work. I just started working in this framework, and I know there are potentially many things wrong with this code, but I’m posting this here so you can see what I’ve done.

with pymc3.Model() as model:
        on = pymc3.distributions.discrete.Bernoulli('on', p=.7)
        x_on = pymc3.distributions.continuous.Normal('x_on', 10, .4)
        pTriangle_given_on = 1
        pTriangle_given_not_on = .7
        pTriangle = pymc3.math.switch(on, pTriangle_given_on, pTriangle_given_not_on)
        triangle = pymc3.distributions.discrete.Bernoulli('triangle', p=pTriangle)
        name = None
        x_triangle = None
        if triangle:
            name = pymc3.distributions.discrete.Categorical('name', p=[.3, .2, .1, .1, .2, .1])
            name = pymc3.distributions.discrete.Categorical('name', p=[.1, .5, .4])
        if on:
            x_triangle = pymc3.Deterministic('x_triangle', x_on)
        elif triangle:
            x_triangle = pymc3.Normal('x_triangle', 5, 1)
            x_triangle = numpy.nan
        trace = pymc3.sample()

I’m not sure how to specify the conditional dependence of X on Triangle and On. Any thoughts from you all would be much appreciated.

Condition Categorical Variable on Bernoulli Parents

As a caveat: I’m not sure what the “proper” way to handle NA-valued outputs is, but below is one way.

Hello and welcome. The core reason the above code doesn’t work is that what you want the if statements to mean is "If the state of on is 1"; but what they’re really asking is "If the variable on is not None".

Ultimately, the value of a random variable is not available until the sampler has started running, so you can’t directly use if on them. The switch statement you have is the appropriate one to use; and you could re-write all of your code to use switch and it would work just fine.

However, when dealing with Bernoulli distributions, it is often more efficient to exploit the fact that multiplication works effectively as a switch. For instance, the (incorrect) code

if on:
  return 5
  return 2

could be implemented correctly as

2 + on * 3

With that in mind, the joint distribution could be defined with:

import pymc3 as pm

p_on = 0.3
p_tri_giv_on = 1.
p_tri_giv_not_on = 0.7
tri_delta_on = p_tri_giv_on - p_tri_giv_not_on

mean_on, sd_on = 10., 0.4
mean_tri, sd_tri = 5., 1.
with pm.Model() as model:
    on = pm.Bernoulli('on', p_on)
    triangle = pm.Bernoulli('triangle', p_tri_giv_not_on + on * tri_delta_on)
    x_lat = pm.Normal('x_l', 0., 1.)  # will transform with `on` and `triangle`
    x = pm.Deterministic('x', NA_ENCODING + \
                              on * (-NA_ENCODING + x_lat * sd_on + mean_on) + \
                             (1 - on)*triangle*(-NA_ENCODING + x_lat * sd_tri + mean_tri))
    res = pm.sample_prior_predictive(1000)

import seaborn as sbn
import pandas as pd
from matplotlib import pyplot as plt
rdf = pd.DataFrame(res)
rdf['sample_num'] = rdf.index

sbn.pointplot(x='sample_num', y='x', data=rdf, join=False)


1 Like

Thank you @chartl !


Quick question @chartl Why is the quoted value 10, rather than 20? Because I see -(-10) + 0 + 10 = 20.


The first term of x is (positive) NA_encoding, which is the default value (i.e. for on=0 and triangle=0). When on=1 then you get

NA_encoding + (-NA_ENCODING + x_lat*sd_on + mean_on) = x_lat*sd_on + mean_on


Ahh ok, yes I see now! Thanks!