Browsing docs about Bayesian Inference using PyMC models, I frequently come across the word node, such as in the following examples, taken here or there:

Once you have defined the nodes that make up your model…

Notice that the stochastic pm.Binomial has been replaced with a deterministic node that simulates values using pm.rbinomial and the unknown parameters theta.

…is basically a mixture of Categorical nodes with Dirichlet Priors…

…the node should be specified as deterministic…

…you can basically set up anything you want on the nodes and refer to them.

Here, we will implement a general routine to draw samples from the observed nodes of a model.

Finally we will discuss a couple of other useful features: custom distributions and arbitrary deterministic nodes.

And so on…

Unfortunately, or by bad luck, I have never yet found a sentence that simply defines what is meant by node, in this kind of context.

Hence my question: how would you users define what a node is in the context of PyMC models?

Incidentally, has anyone in his readings come across an equivalent in French?

Under the hood, PyMC is constructing computational graphs (provided by PyTensor) that express the various calculations needed to do sampling, etc. In this graph, there are nodes that represent operations, nodes that represent data (e.g., observations), etc. The deep dive into how PyMC uses PyTensor can be found here. For prettier visualizations of the computational graphs, you can check out the d3viz functionality provided by PyTensor.

Hm. I suppose it would make more sense for the documentation examples you posted to be referring to graph depicted in the graphviz representation. Either way, nodes are more than just the visualization (the visualizations just visualize the underlying computational graph in one way or another). Maybe it helps to know that, for any PyMC model, you can ask for model.named_vars, model.observed_RVs, model.observed_RVs, and model.deterministics. Each gives a subset of what node might be referring to. @ricardoV94 might have a more detailed answer.

The use of the word “node” in the docs probably comes from the autodiff literature, not from statistics at all. I found these course notes from a French deep learning program that just use the word “nœud”, which i guess is correct from a graph-theoretical perspective but not really very illuminating. The french wikipedia entry for automatic differentiation also uses “nœud”.

Strictly speaking, “node” should refer to any vertex on the computational directed acyclic graph that represents your model, flowing from definition of variables to the final log-likelihood value. These can be variables, numbers, operations, or (in the case of pytensor) random variables. You can see some examples of the types of full computational graphs that PyMC generates for you automatically here – a “node” is just anything on that graph.

Practically speaking, I think you could just find-replace “node” with “variable” in the examples you posted from the docs. It wouldn’t be perfect (“deterministic variable” seems like an oxymoron) but at least IMO it makes more intuitive sense.

Well, thanks to all, but it still seems a little weird to me - but maybe I’m being too picky - that the word “node” isn’t more precisely defined.

Fox example, some quotes come from PyMC Documentation where
the word “node” is used many times.

Another one come from PyMC3 Doc. This one, as an example, corresponds to the quote: Here, we will implement a general routine to draw samples from the observed nodes of a model.

So, suppose someone reads in some doc: from the observed nodes of a model, what comes spontaneously to his mind, regarding nodes?

And if someone, next to him, then asks him: "what does the guy mean by observed nodes"?, what could be his answer? Maybe something like: “oh, he just wants to talk about…”? (about what?)

Well, I fully understand that these kinds of questions are not essential and I wouldn’t want to bother you with that; it’s just that I’ve come across this term quite often, and I would have appreciated knowing exactly what people are talking about. That’s all…

I think part of the issue is that terms like “variable”, “parameter”, and “random variable” are often used/useful, but you can’t use any of them universally. For example, it’s a bit odd to talk about “observed parameters”. But “observed variable” is ok. The “problem” is that, behind the scenes, PyMC doesn’t really care too much about the differences between these distinct concepts. Whether a node is observed or not is important, but otherwise can be “the same kind of thing” computationally.

The term “node” comes from the graphical models literature, whereby graphs are composed of nodes connected by edges. We use it as a general term that would include observed and unobserved random variables, deterministics, or factor potentials in the model DAG.

If you feel so inclined, please create an issue to have it added to the glossary, along with anything else you find that seems poorly-(or un-) defined.

Thank you for entering the discussion, because this term is indeed widely used in your document PyMC Documentation which I quoted above.

I think I know have a sufficiently clear explanation, with all the remarks provided, on this term node, which remained previously so ambiguous for me…

And finally, I thank you for your advice, but I don’t know how to create a issue; I still don’t feel very comfortable with the features of discourse.pymc.io.

Note that this pymc3-testing url hosts docs generated from a fork that is years outdated and is completely independent of the PyMC project (like all forks). I’d recommend reading the official docs, discourse and up to date blogs, not the content there.