I’m pretty new to Bayesian networks, but I was able to figure out a few things in Weka with the gui and the BayesNet algorithm. I’m trying to do the same thing programmatically in Python, and I wasn’t able to with the Weka wrapper. What I’d like to do is learn categories from a term document matrix with about 1300 columns(each is a word). After I have my model learned, I’d like to be able to do two things… 1.) to set evidence/observation to a category and see the probabilities per word change, and 2.) also send in a 1300 column vector and get a prediction as to the category it belongs to. Are these things possible using PyMC? Is there a tutorial/documentation doing something similar? I’ve looked and haven’t found anything yet. Thank you very much!
Sounds like you are describing a latent dirichlet allocation (LDA): https://docs.pymc.io/notebooks/lda-advi-aevb.html
Would you know about setting up just a BayesNet type network. I understand the very basics of it, and I can see how to explain it. It would be a good exercise form to start from that and then compare with other ones. I’m familiar with Weka and would like to get to something like this (https://www.cs.waikato.ac.nz/~remco/weka_bn/node20.html) but programmatically. I’d like to learn a network from data, then set some evidence, and be able to see how it changed.
I have found this library that is able to read the same XMLBIF files that Weka creates/reads (https://programtalk.com/vs2/?source=python/10105/pgmpy/pgmpy/tests/test_readwrite/test_XMLBIF.py)
I have never work with Weka before. @ericmjl works quite a lot with network kind of data, maybe he knows?
I’m afraid I’m not able to help in this particular case. BayesNets - while I have heard the term, I’ve never formally learned it (either self-studied or in class). It may be that I know something about it, but I’m just unaware at the moment.
Thank you for your replies, I may need to try PyMC later. I was able to get pretty far with the Weka wrapper, here’s a link with some working code:
It looks like Weka has an underflow issue with large graphs…
So I’d like to try PyMC. Is it possible to learn a Bayesian network from data like the network shown in the first post here:
It sounds like I could then set my observation to a specific value in my class and then get the most influential attributes, is that correct? Is there a good tutorial for this? What are the right terms to use when talking about PyMC?