Recursive bayesian network (two outcomes)

I am trying to model the following:

image

where as blue ovals denotes outcomes and yellow independent variables.
As you can see, we have this recursive structure where y2 depends on y1.

I am wondering if the following code indeed models this or if i missed out on something, to me it makes sense as it is now though but it may be some error with defining two likelihoods/outcomes and then sampling?

import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)

from pymc import HalfCauchy, Model, Normal, sample

size = 200
true_intercept_y1 = 1
true_slope_x1 = 2
true_slope_x2 = 0.2

x1 = np.linspace(0, 1, size)
x2 = np.linspace(2, 4, size)
true_regression_line = true_intercept_y1 + true_slope_x1 * x1 + true_slope_x2 * x2
# add noise
y1 = true_regression_line + rng.normal(scale=0.5, size=size)

x3 = np.linspace(3, 4, size)
x4 = np.linspace(4, 5, size)
true_intercept_y2 = 1
true_slope_x3 = 2
true_slope_x4 = 0.2
y1_effect = 2
true_regression_line_y2 = true_intercept_y2 + true_slope_x3 * x3 + true_slope_x4 * x4 + y1 * y1_effect
y2 = true_regression_line_y2 + rng.normal(scale=4, size=size)

data = pd.DataFrame(dict(x=x1, x2=x2, x3=x3, x4=x4, y=y1, y2=y2))

with Model() as model:  # model specifications in PyMC are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy("sigma", beta=10)
    intercept = Normal("Intercept", 0, sigma=20)
    slope = Normal("slope", 0, sigma=20)
    slopex2 = Normal("slopex2", 0, sigma=10)
    # Define likelihood
    likelihood_y1 = Normal("y", mu=intercept + slope * x1 + slopex2 * x2, sigma=sigma, observed=y1)

    sigma2 = HalfCauchy("sigma2", beta=10)
    intercept2 = Normal("Intercept2", 0, sigma=20)
    slopex3 = Normal("slopex3", 0, sigma=20)
    slopex4 = Normal("slopex4", 0, sigma=10)
    slopey1 = Normal("slopey1", 0, sigma=10)

    likelihood_y2 = Normal("y2", mu=intercept2 + slopex3 * x3 + slopex4 * x4 + slopey1 * likelihood_y1, observed=y2)

    idata = sample(3000)

That should be fine. You may find this discussion on models that use observed variables useful: How Does the Sampler Use Covariate Observed Data?

what if i would actually want the predictions of y1(posterior predictive distribution) in the downstream task(as a prior) of estimating y2? wouldnt this be the typical approach when working with these structural causal models and bayesian networks? seems a bit strange that it uses the actually observed values in the downstream task…as explained in the linked thread

It does that in prior and posterior predictive sampling. During posterior sampling there’s nothing to infer for observed variables (you’re conditioning on the observed values)

1 Like

this essentially means that the samples of all parameters in equation 2 (y_2 = true_intercept_y2 + true_slope_x3 * x3 + true_slope_x4 * x4 + y1 * y1_effect) in the code above will be widely different depending on if i look at the posterior samples (pm.sample()) or the posterior predictive samples (pm.sample_posterior_predictive()), because when looking at the posterior_predictive_samples it is using samples of y1 derived from sampling from equation 1, which is not true when considering pm.sample()(the posteriors)… however, for equation 1 the posterior samples and posterior_predictive_samples should be essentially the same since this equation is not affected by this recursive structure/structural equation structure?

Can you rephrase the question, I don’t completely follow

sorry, do you understand it better now?

this essentially means that samples of slope for x1 should come from almost the same distribution when considering sampling from the posterior and the posterior predictive distribution, but sampling from slope for x3 should be very different for the posterior and the posterior predictive distribution, tried to see if that was the case and it was not, x1 was also widely different depending on if i considered samples from the posterior or the posterior predictive:

import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
RANDOM_SEED = 8927
rng = np.random.default_rng(RANDOM_SEED)
import pymc.sampling_jax as jax
from pymc import HalfCauchy, Model, Normal, sample
#
size = 200
true_intercept_y1 = 1
true_slope_x1 = 2
true_slope_x2 = 0.2

x1 = np.linspace(0, 1, size)
x2 = np.linspace(2, 4, size)
true_regression_line = true_intercept_y1 + true_slope_x1 * x1 + true_slope_x2 * x2
# add noise
y1 = true_regression_line + rng.normal(scale=0.5, size=size)

x3 = np.linspace(3, 4, size)
x4 = np.linspace(4, 5, size)
true_intercept_y2 = 1
true_slope_x3 = 2
true_slope_x4 = 0.2
y1_effect = 2
true_regression_line_y2 = true_intercept_y2 + true_slope_x3 * x3 + true_slope_x4 * x4 + y1 * y1_effect
y2 = true_regression_line_y2 + rng.normal(scale=4, size=size)

data = pd.DataFrame(dict(x=x1, x2=x2, x3=x3, x4=x4, y=y1, y2=y2))

with Model() as model:  # model specifications in PyMC are wrapped in a with-statement
    # Define priors
    sigma = HalfCauchy("sigma", beta=10)
    intercept = Normal("Intercept", 0, sigma=20)
    slope = Normal("slope", 0, sigma=20)
    slopex2 = Normal("slopex2", 0, sigma=10)
    # Define likelihood
    likelihood_y1 = Normal("y", mu=intercept + slope * x1 + slopex2 * x2, sigma=sigma, observed=y1)

    sigma2 = HalfCauchy("sigma2", beta=10)
    intercept2 = Normal("Intercept2", 0, sigma=20)
    slopex3 = Normal("slopex3", 0, sigma=20)
    slopex4 = Normal("slopex4", 0, sigma=10)
    slopey1 = Normal("slopey1", 0, sigma=10)

    likelihood_y2 = Normal("y2", mu=intercept2 + slopex3 * x3 + slopex4 * x4 + slopey1 * likelihood_y1, observed=y2)

    trace = jax.sample_blackjax_nuts(1000,
                             tune=1000,
                             target_accept=0.97,
                             cores=2,
                             random_seed=0,
                             idata_kwargs={"log_likelihood": True},
                             return_inferencedata=True)

    var_names_deterministic = ["slope", "slopex2", "slopex3", "slopex4", "slopey1"]
    inf = pm.sample_posterior_predictive(trace, random_seed=0, var_names=var_names_deterministic, extend_inferencedata=False)

    slopex1_posterior = trace.posterior.slope
    slopex1_posterior_preditive = inf.posterior_predictive.slope
    slopex2_posterior = trace.posterior.slopex2
    slopex2_posterior_predictive = inf.posterior_predictive.slopex2

    slopey1_posterior = trace.posterior.slopey1
    slopey1_posterior_preditive = inf.posterior_predictive.slopey1
    slopex3_posterior = trace.posterior.slopex3
    slopex3_posterior_predictive = inf.posterior_predictive.slopex3

Careful with selecting unobserved variables in sample_posterior_predictive, it might not do what you think it does: pymc.sample_posterior_predictive — PyMC v5.13.1 documentation

i find it really had to grasp the nature of sample_posterior_predictive even after reading the provided link, e.g this bugs me out:

, compare this to:

As i understnad it, the only reason z gets resampled in the latter case and not the first is the inclusion of “det_xy” in varnames in the latter. This means we have to sample xy which means we need to sample new values for z, i assume… it is essentially this hierarchical structure that creates issues in the latter case since z is samples depending on det_xy

To try to apply this comprehension in my case and how i can solve the issue above, the only way to do this seems to be to wrap the slopes i am interested in in determinstic variables, and then be extremely careful of the order and which variables i include in which sample_posterior_predictive call, since i need to make several ones to avoid the issue above?

Essentially what i am interested in is retrieving the slopes(posterior samples) of x1 and x2 conditioned on the observed data of y1, x1, x2, then given posterior predictive samples of y1, retrieve the posteriors for x3, x4 and slopey1 conditional on the observed data of y2, x3, x4 and the posterior predictive samples of y1. Now how this can be achieved is out of my comprehension as for now, but it seems like we have to set up a quite complex order of calls to the posterior_predictive_sample function really thinking about which variables have dependencies and not…i have no clue how though

Yes it’s danger field. We will fix this soon: Make `sample_posterior_predictive` API less surprising · Issue #7069 · pymc-devs/pymc · GitHub

In the meantime you can compile the specific function you want instead of trying to get sample_posterior_predictive to do what you need. I can give you some code next week as an example if you want to go down that path

great that you are fixing it.
i would like to get this sorted asap, could you verify if my statements are correct below?

"As i understnad it, the only reason z gets resampled in the latter case and not the first is the inclusion of “det_xy” in varnames in the latter. This means we have to sample xy which means we need to sample new values for z, i assume… it is essentially this hierarchical structure that creates issues in the latter case since z is samples depending on det_xy

To try to apply this comprehension in my case and how i can solve the issue above, the only way to do this seems to be to wrap the slopes i am interested in in determinstic variables, and then be extremely careful of the order and which variables i include in which sample_posterior_predictive call, since i need to make several ones to avoid the issue above?

Essentially what i am interested in is retrieving the slopes(posterior samples) of x1 and x2 conditioned on the observed data of y1, x1, x2, then given posterior predictive samples of y1, retrieve the posteriors for x3, x4 and slopey1 conditional on the observed data of y2, x3, x4 and the posterior predictive samples of y1. Now how this can be achieved is out of my comprehension as for now, but it seems like we have to set up a quite complex order of calls to the posterior_predictive_sample function really thinking about which variables have dependencies and not…i have no clue how though"

the main reason i want to use sample_posterior_predictive is that i have everything setup with mutable data and mappings over the prediction horizon

sample_posterior_predictive is not doing anything too fancy, that you can’t do yourself. 90% of the complexity is figuring out which variables can be imputed from the posterior trace and which ones should be resampled. For your specific case, it may be hurting more than it helps.

Let me show how you can cook up your custom sample_posterior_predictive function easily enough (hopefully):

import pymc as pm

with pm.Model(coords={"trial": range(3)}) as m:
  x1 = pm.Normal("x1")
  y1 = pm.Normal("y1", x1, observed=[0, -1, 1], dims="trial")

  x2 = pm.Normal("x2")
  y2 = pm.Normal("y2", x2 * y1, observed=[-2, 0, 2], dims="trial")

  # posterior = pm.sample().posterior
  # For faster example, will use prior and pretend it's a posterior
  posterior = pm.sample_prior_predictive().prior

Now say you want a function that takes x1 and x2 posterior draws, and samples new y1 and y2

inputs = [x1, x2]
outputs = [y1, y2]
custom_pred_fn = m.compile_fn(outs=outputs, inputs=inputs, random_seed=5)

# Test with one point
print(custom_pred_fn({"x1": 0.5, "x2": 0.9}))
# [array([ 0.41297267,  1.5867006 , -0.14875281]), array([0.21406306, 1.45543159, 0.13762093])]

# Confirm draws change over functions evaluations
print(custom_pred_fn({"x1": 0.5, "x2": 0.9}))
# [array([-0.33605213,  1.25911615, -1.2710826 ]), array([-0.81434557,  3.4961431 , -0.11901213])]

To apply it over your posterior trace, you can use the following pymc helper:

from pymc.backends.arviz import apply_function_over_dataset, coords_and_dims_for_inferencedata

coords, dims = coords_and_dims_for_inferencedata(m)

posterior_pred = apply_function_over_dataset(
    fn=custom_pred_fn,
    dataset=posterior[[inp.name for inp in inputs]],  # Select input variables in same order as function
    output_var_names=[out.name for out in outputs],  # Assign output variable names in same order as function
    coords=coords,
    dims=dims,
)
print(posterior_pred)
# <xarray.Dataset> Size: 28kB
# Dimensions:  (chain: 1, draw: 500, trial: 3)
# Coordinates:
#   * chain    (chain) int64 8B 0
#   * draw     (draw) int64 4kB 0 1 2 3 4 5 6 7 ... 493 494 495 496 497 498 499
#   * trial    (trial) int64 24B 0 1 2
# Data variables:
#     y1       (chain, draw, trial) float64 12kB -0.9762 -0.2946 ... -1.354
#     y2       (chain, draw, trial) float64 12kB -1.15 -0.0896 ... 0.8917 1.43
# Attributes:
#     created_at:                 2024-04-22T07:26:36.219180
#     arviz_version:              0.17.1
#     inference_library:          pymc
#     inference_library_version:  5.13.0

This is still entirely compatible with using and setting Data containers

Yes

My warning is that in general you should be very careful about including any non-observed variables in var_names as that is generally only what you want in very restricted cases. That’s why your experiment above failed, not because your logic was wrong but because it didn’t do what you wanted it to do.

I think you may get what you want with just default sample_posterior_predictive (don’t override var_names)? Just need to be careful that no inferred variables exist directly between the two ys, since otherwise they would be resampled. In your examples so far I don’t think that’s the case, as the values of likelihood_1 only show up in likelihood2 and not as the parameters of any of slopex or x data.

If default sample_posterior_predictive is not building the function you want it to build, you can use my code above to build it yourself.