Under the hood, PyMC is constructing computational graphs (provided by PyTensor) that express the various calculations needed to do sampling, etc. In this graph, there are nodes that represent operations, nodes that represent data (e.g., observations), etc. The deep dive into how PyMC uses PyTensor can be found here. For prettier visualizations of the computational graphs, you can check out the d3viz functionality provided by PyTensor.
Hm. I suppose it would make more sense for the documentation examples you posted to be referring to graph depicted in the graphviz representation. Either way, nodes are more than just the visualization (the visualizations just visualize the underlying computational graph in one way or another). Maybe it helps to know that, for any PyMC model, you can ask for model.named_vars, model.observed_RVs, model.observed_RVs, and model.deterministics. Each gives a subset of what node might be referring to. @ricardoV94 might have a more detailed answer.
The use of the word “node” in the docs probably comes from the autodiff literature, not from statistics at all. I found these course notes from a French deep learning program that just use the word “nœud”, which i guess is correct from a graph-theoretical perspective but not really very illuminating. The french wikipedia entry for automatic differentiation also uses “nœud”.
Strictly speaking, “node” should refer to any vertex on the computational directed acyclic graph that represents your model, flowing from definition of variables to the final log-likelihood value. These can be variables, numbers, operations, or (in the case of pytensor) random variables. You can see some examples of the types of full computational graphs that PyMC generates for you automatically here – a “node” is just anything on that graph.
Practically speaking, I think you could just find-replace “node” with “variable” in the examples you posted from the docs. It wouldn’t be perfect (“deterministic variable” seems like an oxymoron) but at least IMO it makes more intuitive sense.
Well, thanks to all, but it still seems a little weird to me - but maybe I’m being too picky - that the word “node” isn’t more precisely defined.
Fox example, some quotes come from PyMC Documentation where
the word “node” is used many times.
Another one come from PyMC3 Doc. This one, as an example, corresponds to the quote: Here, we will implement a general routine to draw samples from the observed nodes of a model.
So, suppose someone reads in some doc: from the observed nodes of a model, what comes spontaneously to his mind, regarding nodes?
And if someone, next to him, then asks him: "what does the guy mean by observed nodes"?, what could be his answer? Maybe something like: “oh, he just wants to talk about…”? (about what?)
Well, I fully understand that these kinds of questions are not essential and I wouldn’t want to bother you with that; it’s just that I’ve come across this term quite often, and I would have appreciated knowing exactly what people are talking about. That’s all…
I think part of the issue is that terms like “variable”, “parameter”, and “random variable” are often used/useful, but you can’t use any of them universally. For example, it’s a bit odd to talk about “observed parameters”. But “observed variable” is ok. The “problem” is that, behind the scenes, PyMC doesn’t really care too much about the differences between these distinct concepts. Whether a node is observed or not is important, but otherwise can be “the same kind of thing” computationally.
The term “node” comes from the graphical models literature, whereby graphs are composed of nodes connected by edges. We use it as a general term that would include observed and unobserved random variables, deterministics, or factor potentials in the model DAG.
If you feel so inclined, please create an issue to have it added to the glossary, along with anything else you find that seems poorly-(or un-) defined.
Note that this pymc3-testing url hosts docs generated from a fork that is years outdated and is completely independent of the PyMC project (like all forks). I’d recommend reading the official docs, discourse and up to date blogs, not the content there.