Possibly naive question, but is there a way to do structure learning of Bayesian networks with PyMC3? One possibility might be to define a score based on the overall model logp and then run an optimizer to maximize / minimize that score. The logp, I imagine, can be directly obtained from pymc3?
Thoughts? Other approaches?
2 Likes
If the structure is represented by some discrete parameters, doing inference is going to be a bit challenging. But otherwise you should be able to set up the model in pymc3.
Sorry, not quite sure what you meant by “structure represented by discrete parameters”? Are you talking about discrete RVs in the DAG?
I guess I didn’t get what exactly do you mean by structure learning - are you trying to learn the structure of a Bayesian network like how the nodes are connected?
Yep, the topology of the network.
In that case, how would you represent the topology? An adjacency matrix? It really depending on your data and set up so if you could go into a bit more detail it would helps.
Ah, I see. I haven’t thought about it, but adjacency matrices may be one way. My problem is essentially: I have 2 bayesian networks (DAGs) with all continuous variables. I want to start coupling them by creating an edge (and a corresponding gaussian probability distribution) between say node a1 of DAG1 and node a2 of DAG2. But the choice of a1 and a2 is arbitrary. So eventually, I want to be able to form all possible connections between nodes of the DAGs and optimize the inter-connection topology given observations at one or more nodes of one or both DAGs (the topology of the DAGs themselves may remain constant reducing the search space of overall graph topology a little).
Are all the nodes in both DAGs fully observed?
No. There will be hidden nodes in both DAGs. What I meant is there will be at least one observed node in the whole system. So both DAGs might have observed nodes, but a nonzero number of hidden nodes too.
Hey, did you dig further into that? I have a small reference implementation for structure learning when all variables are observed and would be interested to see how you do it when latent variables are involved.