Implementing hierarchical priors encoded in tree structures

Hi, many thanks for your detailed reply! I’ve learned a lot (I’m neither fluent in probabilistic modeling nor in PyMC)

I tried your suggestion above and at least managed to make the code run. However, I was not able to achieve the result I’m looking for.

I will attach my partially-working solution in a separate reply.

My goal:

  • I would like to build a linear model to solve a regression problem, in which the input variables correspond to different “categories”
  • the categories can be organized into a tree structure, for instance, a general category (e.g., fruit) contains more specific ones (e.g., apple and orange) as children
  • Finally, I would like to incorporate such relations into the regression model, in that distribution of a general category provides information about that of its children.
    • this would be useful if very few observations on the children variables can be obtained

Below is an example to illustrate the expected outcome.

(Note that this example is not using the vectorized approach suggested above, thus does not scale.)

Say I have 5 variables/categories, A to E, with the relations that A is the root and is parent of B and C. And B is the parent of D and E.

from treelib import Tree

# say we have 5 variables, with the following relations
tree_dict = {"A": {"B": {"D": None, "E": None}, "C": None}}

# Create a new tree
tree = Tree()


# Function to add nodes recursively
def add_nodes(tree, parent, name_to_children):
    for name, children in name_to_children.items():
        tree.create_node(name, name, parent=parent)
        if children:
            add_nodes(tree, name, children)


# Add nodes to the tree
add_nodes(tree, None, tree_dict)

# Print the tree
print(tree)

which gives the following tree structure

A
├── B
│   ├── D
│   └── E
└── C

Then I build the model using recursion as follows

import numpy as np
import pymc as pm
from pymc import HalfCauchy, Model, Normal

RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)


def get_model(tree, X_data, y_data):
    predictor_names = list(sorted(tree.nodes.keys()))
    coords = {
        "predictor": predictor_names,
        "sample": np.arange(X_data.shape[0]),
    }
    with Model(coords=coords) as model:
        SIGMA = 10

        def build_beta(tree, node_name, parent_predictor):
            """construct the input variables recursively, note that it does not scale for large trees"""
            ret = {}
            name = f"beta_{node_name}"
            if parent_predictor is None:
                # root
                predictor = Normal(name=node_name, mu=0, sigma=SIGMA)
            else:
                predictor = Normal(
                    name=node_name, mu=parent_predictor, sigma=SIGMA
                )

            ret[node_name] = predictor

            for child in tree.children(node_name):
                ret |= build_beta(tree, child.identifier, predictor)
            return ret

        X = pm.MutableData(
            "X",
            X_data,
            dims=["sample", "predictor"],
        )

        # Define priors
        sigma = HalfCauchy("sigma", beta=5)
        beta_var_dict = build_beta(tree, tree.root, None)
        beta = pm.Deterministic(
            "beta",
            pm.math.stack(
                [
                    beta_var_dict[pred_name]
                    for pred_name in model.coords["predictor"]
                ]
            ),
            dims="predictor",
        )

        # Define likelihood
        likelihood = Normal(
            "y", mu=pm.math.dot(X, beta), sigma=sigma, observed=y_data
        )
    return model

To illustrate my point, I fit the model on some synthetic data, in which certain variables are “masked out”. The expectation is that the information of masked out variables can inferred based on their parents’ posterior.

In the example below, I deliberately masked the values corresponding to D and E, therefore nothing can be learned about D and E from the data if we assume the variables are independent.

However, under the above model specification, D and E, are children of B. Therefore, the mean of their posterior should be close to the mean of B’s posterior (which is 2)


# beta is the coefficients of the underlying model
#                A  B  C  D  E
beta = np.array([1, 2, 3, 4, 5])

num_samples = 500

X = np.random.random((num_samples, len(beta)))
# "hide" D and E from the input data by zeroing them out
X[:, -2:] = 0

y = np.dot(X, beta) + rng.normal(scale=0.001, size=num_samples)

with get_model(tree, X, y) as model:
    pm.set_data({"X": X})
    trace = pm.sample(tune=3000)
print(trace.posterior.beta.mean(axis=1).mean(axis=0).to_series())

The output is something like:

predictor
A    1.000126
B    1.999723
C    3.000123
D    2.016451 -> close to B's mean
E    1.975278 -> same as above

Note that D and E’s posterior mean “inherits” from B’s posterior mean, due to the tree structure incorporated in the model.

My questions are perhaps a bit open:

  • assume that we can implement the above model using pytensor.scan (as suggested in the prevoius reply), is such approach a reasonable way to incorporate structural dependency among the variables?

I have also been thinking of using multi-variate Gaussian The main idea is to use the covariance matrix to capture the inter-dependency among the variables. However, I do not see a good way to initialize the covariance matrix.