It’s beginners question…
Let’s assume we’ve got graph as in picture below:

Now you have a lot of data that look like this (all nodes are either True(T) or False(F))

D | I | G | S | L |
--------------------
1. T | T | F | T | F |
2. T | T | T | F | F |
etc.

How would you parameterize or define conditional probability in pymc3, if you wanted to learn CPD/CPT from data?

How would you parameterize Grade dependency on both Difficulty and Intelligence?
It could be done with logistic regression where parameter of Grade distribution is defined as theta_grade = alpha + beta_1*difficulty + beta_2*intelligence + error
But i was wondering if there’s other way… I mean if you wanted to do MLE, how would you define graph in pymc3?

Not sure I understand, if you have data like that all your nodes are observed, and you wont have things to estimate as you have no unknown.

In general, to have dependency, you make sure the parameter of Grade is some function (e.g., linear function like what you have above) of the output from Difficulty and Intelligence.

Really what you want to do is estimate the normalization factor (since the integrand for the conditional distribution is just the joint density), so that

P(U|V=v) = P(U, V=v)/N(v)

N(v) is of course just the integral over U. So you could get this as:

with pm.Model() as joint:
x = pm.Normal('x', 0., 1.)
y = pm.Normal('y', 0., 1.)
z_l = pm.Deterministic('zl', x + y)
z = pm.Normal('z', z_l, 1.)
# sample from conditional
with pm.Model() as conditional:
# z = x + y
x = pm.Constant('x', 1.)
y = pm.Normal('y', 0., 1.)
z_l = pm.Deterministic('zl', x + y)
z = pm.Normal('z', z_l, 1.)
cond_samples = pm.sample(1000)
# evaluate probability under the joint
N = 100
joint_lp = np.array([joint.logp({'x': cond_samples['x'][i],
'y': cond_samples['y'][i],
'z': cond_samples['z'][i]})
for i in xrange(N)])
# monte carlo integration
cond_lp = np.array([conditional.logp({'x': cond_samples['x'][i],
'y': cond_samples['y'][i],
'z': cond_samples['z'][i]})
for i in xrange(N)])
# MCI: 1/N sum f(x_i)/p(x_i)
Vlg = np.log(np.sum(np.exp(joint_lp)/np.exp(cond_lp))/N)
# define the (approximate) conditional distribution
p_cond = lambda d: joint.logp(d) - Vlg

I found the .logp evaluations to be terribly slow, for some reason.

I would interpret it as P(X_1) = P(X_1=1), which means P(X1) = sum(X1==1)/len(X1). Similarly, you have P(X3|X1, X2) = sum(X1==1 & X2==1 & X3==1)/sum(X1==1 & X2==1)

Those are not generated samples from Bayesian network, meaning, you can’t look at them as Joint probability.

If you have P(X3|X1, X2) = sum(X1==1 & X2==1 & X3==1)/sum(X1==1 & X2==1),
you already know how to generate sample of X3 when you know values of X1 and X2.

In this situation here, we’ve got only known graph, and data - and we want to learn how X3 depends of X1 and X2.

In this case, since the value are binary and you have no other information, you do know how to generate sample by using a bernoulli distribution - which means you can estimated the dependence between X3 and X1+X2 by computing the proportion (that would be the MLE).

and the posterior distribution of rp is, I think, what you want – giving both mean and uncertainty about the rate of X3 under the various states of X1 and X2.

(My guess is that tt.ifelse may be more efficient than these multiplications)