Does a strict definition of “node” exist in the frame of PyMC models?

Andre · March 22, 2023, 1:53pm

Browsing docs about Bayesian Inference using PyMC models, I frequently come across the word node, such as in the following examples, taken here or there:

Once you have defined the nodes that make up your model…

Notice that the stochastic pm.Binomial has been replaced with a deterministic node that simulates values using pm.rbinomial and the unknown parameters theta.

…is basically a mixture of Categorical nodes with Dirichlet Priors…

…the node should be specified as deterministic…

…you can basically set up anything you want on the nodes and refer to them.

Here, we will implement a general routine to draw samples from the observed nodes of a model.

Finally we will discuss a couple of other useful features: custom distributions and arbitrary deterministic nodes.

And so on…

Unfortunately, or by bad luck, I have never yet found a sentence that simply defines what is meant by node, in this kind of context.

Hence my question: how would you users define what a node is in the context of PyMC models?

Incidentally, has anyone in his readings come across an equivalent in French?

cluhmann · March 22, 2023, 2:39pm

Under the hood, PyMC is constructing computational graphs (provided by PyTensor) that express the various calculations needed to do sampling, etc. In this graph, there are nodes that represent operations, nodes that represent data (e.g., observations), etc. The deep dive into how PyMC uses PyTensor can be found here. For prettier visualizations of the computational graphs, you can check out the d3viz functionality provided by PyTensor.

Andre · March 22, 2023, 3:26pm

Hi cluhmann, I was just aware of GraphViz to produce graph structure of a model, such as:

pm.model_to_graphviz(basic_model)

displaying the following:

graph

PyTensor seems a bit more complicated…

Well, ultimately, you’re saying that “node” simply refers to the nodes in the graphical representation of the model?

So that, in a model, each of the following lines:

pm.Normal(“name”, …)
pm.HalfNormal(“name”, …)
pm.Beta(“name”, …)
pm.Binomial(“name”, …)
etc…etc…

are considered as, and can be qualified as, “nodes”, but of course not pm.sample(...), right?

cluhmann · March 22, 2023, 4:05pm

Hm. I suppose it would make more sense for the documentation examples you posted to be referring to graph depicted in the graphviz representation. Either way, nodes are more than just the visualization (the visualizations just visualize the underlying computational graph in one way or another). Maybe it helps to know that, for any PyMC model, you can ask for model.named_vars, model.observed_RVs, model.observed_RVs, and model.deterministics. Each gives a subset of what node might be referring to. @ricardoV94 might have a more detailed answer.

jessegrabowski · March 23, 2023, 2:36am

The use of the word “node” in the docs probably comes from the autodiff literature, not from statistics at all. I found these course notes from a French deep learning program that just use the word “nœud”, which i guess is correct from a graph-theoretical perspective but not really very illuminating. The french wikipedia entry for automatic differentiation also uses “nœud”.

Strictly speaking, “node” should refer to any vertex on the computational directed acyclic graph that represents your model, flowing from definition of variables to the final log-likelihood value. These can be variables, numbers, operations, or (in the case of pytensor) random variables. You can see some examples of the types of full computational graphs that PyMC generates for you automatically here – a “node” is just anything on that graph.

Practically speaking, I think you could just find-replace “node” with “variable” in the examples you posted from the docs. It wouldn’t be perfect (“deterministic variable” seems like an oxymoron) but at least IMO it makes more intuitive sense.

Andre · March 23, 2023, 7:51pm

Well, thanks to all, but it still seems a little weird to me - but maybe I’m being too picky - that the word “node” isn’t more precisely defined.

Fox example, some quotes come from PyMC Documentation where
the word “node” is used many times.

Another one come from PyMC3 Doc. This one, as an example, corresponds to the quote: Here, we will implement a general routine to draw samples from the observed nodes of a model.

So, suppose someone reads in some doc: from the observed nodes of a model, what comes spontaneously to his mind, regarding nodes?

And if someone, next to him, then asks him: "what does the guy mean by observed nodes"?, what could be his answer? Maybe something like: “oh, he just wants to talk about…”? (about what?)

Well, I fully understand that these kinds of questions are not essential and I wouldn’t want to bother you with that; it’s just that I’ve come across this term quite often, and I would have appreciated knowing exactly what people are talking about. That’s all…

cluhmann · March 23, 2023, 8:04pm

I think part of the issue is that terms like “variable”, “parameter”, and “random variable” are often used/useful, but you can’t use any of them universally. For example, it’s a bit odd to talk about “observed parameters”. But “observed variable” is ok. The “problem” is that, behind the scenes, PyMC doesn’t really care too much about the differences between these distinct concepts. Whether a node is observed or not is important, but otherwise can be “the same kind of thing” computationally.

fonnesbeck · March 24, 2023, 8:08pm

Hi Andre,

The term “node” comes from the graphical models literature, whereby graphs are composed of nodes connected by edges. We use it as a general term that would include observed and unobserved random variables, deterministics, or factor potentials in the model DAG.

If you feel so inclined, please create an issue to have it added to the glossary, along with anything else you find that seems poorly-(or un-) defined.

Andre · March 24, 2023, 9:15pm

Hi, fonnesbeck,

Thank you for entering the discussion, because this term is indeed widely used in your document PyMC Documentation which I quoted above.

I think I know have a sufficiently clear explanation, with all the remarks provided, on this term node, which remained previously so ambiguous for me…

And finally, I thank you for your advice, but I don’t know how to create a issue; I still don’t feel very comfortable with the features of discourse.pymc.io.

OriolAbril · March 25, 2023, 2:07pm

Note that this pymc3-testing url hosts docs generated from a fork that is years outdated and is completely independent of the PyMC project (like all forks). I’d recommend reading the official docs, discourse and up to date blogs, not the content there.

Andre · March 25, 2023, 2:49pm

Thanks, understood !

Topic		Replies	Views
Effect of distribution node without observed data modeling	1	258	June 2, 2023
Stochastic node in pymc v5 modeling	1	36	March 7, 2025
Sampling problem of discrete parent nodes for continuous nodes v5 modeling , sampling	2	13	October 17, 2024
Adding node to pymc3 model breaks sampling Questions	1	596	February 1, 2018
Feature discussion: Grid estimation with PyMC? Development	4	689	May 14, 2020

Does a strict definition of “node” exist in the frame of PyMC models?

Related topics