Semantik / description of Kruschke diagrams (model_to_graphviz)

These are usually called “Bayesian networks” or “directed acyclic graphical models” (DAGs) and provide a way to define a joint distribution with density p(\theta, y) over parameters \theta and data y. Ancestral sampling (aka forward sampling) is what you do to generate random draws of \theta, y from the joint model over data and parameters, which you can always do with a directed acyclic graphical model.

The observed/unobserved and random/constant distinctions are orthogonal. Usually you have boxes for constants and circles for random variables, with circles optionally shaded if they’re observed. I think that’s the convention here.

For example, in a linear regression y_n \sim \textrm{normal}(\alpha + \beta \cdot x_n, \sigma), you typically treat y_n, \alpha, \beta, \sigma as random variables in that they are assigned probability distributions, with x_n being the only constant or non-random variable (other than, say, the size of y). Then we treat y_n, x_n as observed, but \alpha, \beta, \sigma as unobserved. Gelman and Hill use the term modeled data for the the observed outcomes y_n in this example and unmodeled data for the observed covariates x_n in this example (assuming they’re not given a distribution, for example to deal with missing data). In ML, when there is unmodeled data other than constant sizes, such as the covariates x in a regression, they tend to call it a “discriminative” model.

You’ll also note that there’s no way to tell which order the arguments for Normal are displayed—here it’s leaning on our intuitions that mu is the location and sigma is the scale parameter, but that’s not part of the notation! The graphical renderings are massively underspecified in most cases.

The bigger box with the number in it is a plate.

Very often, the graphical models you find in papers are incomplete in that they don’t talk about the requisite indexing (e.g., the classic LDA graphical models) or don’t talk about the distributions (e.g., the classic LDA graphical models) or don’t tell you the argument order (e.g., in the example here, the notation doesn’t tell you that mu is the location and sigma is the scale of the normal—that’s just a naming convention that’s not part of the graphical model notation—usually people try to keep the natural order of parameters in the distribution, so the example here is extra confusing).

1 Like