Efficiently generating samples consistent with observed variables to answer conditional queries

Hi @codez266

Bear in mind I don’t know your level of experience with Bayes or causal inference, so these are just some pointers which may or may not help.

It is worth making sure you really know what you want from your queries. For example conditional distributions like P(X|Z=z) are not the same as interventional distributions (you mentioned the do-operator) like P(X|do(Z=z)). We’ve got an exampling covering the do-operator in PyMC and the difference between conditional and interventional distributions here Interventional distributions and graph mutation with the do-operator — PyMC example gallery.

Check PyMC is the right package for you. There are many different flavours of Bayes Nets. PyMC excels in the case of (mostly) continuous variables where you can describe all the relationships between children and their parents i.e. P(X|Parents(X). If you have exclusively binary or categorical variables or don’t have the structural causal relations that parent nodes have on child nodes, then there are other packages out there which are optimised for those use cases.

Without any of the specifics, generating samples that are consistent with observations is not a problem. The following (simplified) model will do that no problem, and draw samples consistent with Z

with pm.Model() as model:
    X = pm.Normal("X", mu=0, sigma=1)
    Z = pm.Normal("Z", mu=X, sigma=1, observed=value_of_Z)
    idata = pm.sample()

But to be honest, it sounds like more information is really needed to understand the problem and answer your questions :+1:t2:

1 Like