I have cretae a sizeable bayesian network of about 80 nodes using pymc. I want to use the network to answer multiple queries P(X|Z) where evidence Z on some of the variables is provided, and I want to know the probability of query variables. I could generate the trace of the joint distribution and then use pandas to answer conditional queries like here - Bayes Nets, Belief Networks, and PyMC.
However, if my evidence Z is far down in the DAG and if Z's evidence value is very infrequent, a lot of the generated samples may have to be discarded due to not being consistent with the evidence on Z. How can I generate samples efficiently that are consistent with the evidence and help me answer the conditional queries?
For each conditional query, do I define a new model where I use the observed
parameter on each of the observed variables to set to the given value and then sample? Or is there a more efficient way to generate evidence consistent samples to answer conditional queries?
I was also exploring the use of the do
operator but it seems its good for sampling dependents when evidence is provided on parents but not the other way around.
Hi @codez266
Bear in mind I don’t know your level of experience with Bayes or causal inference, so these are just some pointers which may or may not help.
It is worth making sure you really know what you want from your queries. For example conditional distributions like P(X|Z=z) are not the same as interventional distributions (you mentioned the do-operator) like P(X|do(Z=z)). We’ve got an exampling covering the do-operator in PyMC and the difference between conditional and interventional distributions here Interventional distributions and graph mutation with the do-operator — PyMC example gallery.
Check PyMC is the right package for you. There are many different flavours of Bayes Nets. PyMC excels in the case of (mostly) continuous variables where you can describe all the relationships between children and their parents i.e. P(X|Parents(X). If you have exclusively binary or categorical variables or don’t have the structural causal relations that parent nodes have on child nodes, then there are other packages out there which are optimised for those use cases.
Without any of the specifics, generating samples that are consistent with observations is not a problem. The following (simplified) model will do that no problem, and draw samples consistent with Z
with pm.Model() as model:
X = pm.Normal("X", mu=0, sigma=1)
Z = pm.Normal("Z", mu=X, sigma=1, observed=value_of_Z)
idata = pm.sample()
But to be honest, it sounds like more information is really needed to understand the problem and answer your questions
1 Like
Hi @drbenvincent, thanks for your reply! I took some time to read more on causal inference, and I’m clear now that I’m looking for posterior distribution under observations, not counterfactuals.
Specifically, the problem I am studying is maximizing information gain under a budget. I want to exploit the information structure in the problem using a connected graph and define an adaptive selection policy. Here’s an illustration.
Consider a set of sensors S = \{S_1, S_2...S_n\} each providing some measurements of interest. The sensors form a DAG where each sensor has some parents and could itself be parents of other sensors. Each sensor depends on readings from its parents to work correctly. Each sensor also has an independent probability of failure ps_{i}. Therefore, a sensor can fail either due to its own independent failure, or if any of its parents have failed. One by one, I can select a sensor to probe and check if its working correctly. We can assume that the probing action yields a precise measurement. I have to maximize my knowledge about the working status of sensors in minimal number of probes. I am modeling the failure probability of sensors as a noisy AND
of its parents plus its own failure of probability.
Thus, I have Bernoulli sensor states (could be more states in a more complicated version of the problem) and a well-defined description of all the relationships between children and their parents, i.e., P(X|Parents(X). The adaptive information selection would rely on the objective of selecting nodes that provide the highest reduction in entropy of the knowledge of states over the graph. Intuitively, if I pick a node lower in the DAG, and it is observed to be failed, I can tell that its descendants won’t work, but I still have to explore its parents.
I see that you mentioned that pyMC excels for continuous type variables, but I ran some small scale experiments on toy graphs (~DAG with 6 nodes and bernoulli states). I defined some dependency between the nodes, and could see a differential value of working state uncertainty reduction over the graph on observing different nodes, depending on the node’s position in the graph. However, I am yet to see pyMC will work for a realistic graph in my application. Any thoughts appreciated!