What is the best way to define a multivariate regression in the context of hierarchical bayesian models?

I am new to the world of PyMC3 so apologize if my wording is not 100% correct.

I am trying to build a Hierarchical Bayesian models with many predictors.

I am able to get a working version by adapting the syntax of this tutorial.

However, the code becomes a bit wordy since I need to define the parameters for 20+ predictors (I am just using 5 predictors in the snippet below, but you get the point)

with pm.Model() as hbm1:
    # Hyperpriors for group nodes
    mu_a = pm.Normal("mu_a", mu=0.0, sigma=100)
    sigma_a = pm.HalfNormal("sigma_a", 5.0)
    mu_b = pm.Normal("mu_b", mu=0.0, sigma=100)
    sigma_b = pm.HalfNormal("sigma_b", 5.0)
    mu_c = pm.Normal("mu_c", mu=0.0, sigma=100)
    sigma_c = pm.HalfNormal("sigma_c", 5.0)
    mu_d = pm.Normal("mu_d", mu=0.0, sigma=100)
    sigma_d = pm.HalfNormal("sigma_d", 5.0)
    mu_e = pm.Normal("mu_e", mu=0.0, sigma=100)
    sigma_e = pm.HalfNormal("sigma_e", 5.0)

    a = pm.Normal("a", mu=mu_a, sigma=sigma_a, shape=n_groups)
    b = pm.Normal("b", mu=mu_b, sigma=sigma_b, shape=n_groups)
    c = pm.Normal("c", mu=mu_c, sigma=sigma_c, shape=n_groups)
    d = pm.Normal("d", mu=mu_d, sigma=sigma_d, shape=n_groups)
    e = pm.Normal("e", mu=mu_e, sigma=sigma_e, shape=n_groups)

    # Model error
    eps = pm.HalfCauchy("eps", 5.0)

    y_est = (a[group_id] + 
               b[group_id] * df["0"].values + 
               c[group_id] * df["1"].values +
               d[group_id] * df["2"].values +
               e[group_id] * df["3"].values )


    # Data likelihood
    y_like = pm.Normal("yoy_like", mu=y_est, sigma=eps, observed=df["target"])

with hbm1:
    hbm1_trace = pm.sample(2000, tune=2000, target_accept=0.9)

Instead of defining the priors for each parameter individually, I try to wrap everything in an object that has one extra dimension. See the example below,

X = df[["cst", *predictors[:n_predictors]]].values
with pm.Model() as hbm2:
    # Hyperpriors for group nodes
    mu = pm.Normal("mu_a", mu=0.0, sigma=100)
    sigma = pm.HalfNormal("sigma_a", 5.0)

    beta = pm.Normal("beta", mu=mu, sigma=sigma, shape=(n_predictors + 1, n_groups))

    # Model error
    eps = pm.HalfCauchy("eps", 5.0)

    y_est = (X * beta[:, group_id].T).sum(axis=-1)

    # Data likelihood
    y_like = pm.Normal("yoy_like", mu=y_est, sigma=eps, observed=df["target"])
    
with hbm2:
    hbm2_trace = pm.sample(2000, tune=2000, target_accept=0.9)

I now have a couple of questions:

  • Are the first and the second syntax producing exactly the same model? Or am I doing something wrong?
  • what would be the equivalent of the command pm.traceplot(hbm1_trace, ["a", "b", "c"]) for the syntax2 model? I would like to slice my beta and show the distributions for the individual predictors but I can t figure out how to properly do that
1 Like

Hi,
Regarding the first question. No it is not the same model since you add a hyperparameter. Now your betas will be correlated. Besides in the second model you only have one “sigma” instead of several.
If the first model is too “wordy” why don’t you use a shape parameter for mus and sigmas?

Regarding the second question, I would not know exactly the syntax but you will have extended possibilities by using arviz instead of pm.traceplot

@greghor you could try Bambi to build hierarchical models on top of PyMC3 using a formula interface like the one in R. You have several examples in the Examples section of the documentation.
Also, feel free to ping me if you have any questions about how it works.

hey Thomas thanks for your reply.
Fair point, I shall use a shape parameter for mus and sigmas as well!

Thank you for the suggestion. Looks super interesting, too bad I didn’t find this earlier. Love the name :deer:

1 Like