Combining multiple models into single model for uncertainty propagation

JAB · March 9, 2024, 12:04am

I have several models that all work decently well independently. In my actual workflow I would like to have the output of the first model be a predictor in my second, and the same for a third model. I have reason to believe these are all sequential processes. The main thing is that I want to correctly propagate the uncertainty (rather than just compute the likelihood mean and pass that to the next model). Additionally, I would like to be able to perturb the model but exploring the impact of changing one variable, while holding all other inputs constant (or at least to some specified value).

I have a first model that is a basic linear regression c ~ a + b.

# Define features and target
m1_features = ["a", "b"]

# Define target
m1_target = "c"

coords = {"features": m1_features}

with pm.Model(coords=coords) as m1:

    # Define X data
    m1X = pm.MutableData("m1X", train[m1_features].values)
    m1y = train[m1_target].values
    
    # Define weights
    beta = pm.StudentT("beta", mu=0, sigma=10, nu=7, dims="features")

    # Define intercept
    alpha = pm.StudentT("alpha", mu=0, sigma=10, nu=7)

    # Define linear model
    lm = alpha + (m1X*beta).sum(axis=-1)

    # Define model error
    err1 = pm.Exponential("err1",0.5)
    
    # Define likelihood for model 1
    like1 = pm.StudentT("like1", mu=lm, sigma=err1, nu=7, observed=m1y, shape=m1X.shape[0])

I then have a second model (BART) that takes the output of model1 as an input. e ~ Normal(mu, sigma) where mu ~ BART(c,d) (probably not correct nomenclature but hopefully the idea is clear).

m2_zscore_features = ["c", "d"]
m2_target = "e"

with pm.Model() as m2:
    # Define X data
    m2X = pm.MutableData("m2X", train[m2_zscore_features].values)
    m2y = train[m2_target].to_numpy()
    
    # mu
    mu = pmb.BART("mu",X=m2X, Y=np.log(m2y+0.01), m=100)

    # Define model error
    err2 = pm.Exponential("err2",0.5)
    
    # Define likelihood for model 2
    like2 = pm.Normal("like2", mu=pm.math.exp(mu), sigma=err2, observed=m2y, shape=mu.shape[0])

Finally, the output of model 2 is passed into a third model, which uses a HSGP.


m3_features = "e"
m3_target = "f"

train = train.sort_values(by=m3_features,ascending=True)

coords = {
    "x": train[m3_features].reset_index().index.tolist()
}

with pm.Model(coords=coords) as m3:

    # Define data
    m3X = pm.MutableData("m3X", train[m3_features].to_numpy().reshape(-1,1))
    m3y = train[m3_target].to_numpy()

    # Set prior on GP hyperparameters
    eta = pm.Exponential("eta", lam=1)
    ell = pm.InverseGamma("ell", alpha=2, beta=20)

    # Specify covariance function and GP
    cov = eta**2 * pm.gp.cov.Matern52(1, ls=ell)
    
    # HSGP
    gp = pm.gp.HSGP(m=[20], c=3, cov_func=cov)

    # Function prior
    ff = gp.prior("ff", X=m3X, dims="x")

    # Model err
    err3 = pm.HalfNormal("err3", sigma=3)
    like3 = pm.Normal("like3", mu=ff, sigma = err3, observed=m3y)

I started to try to combine them into a single combined model, but I can’t figure out how to get the model to “see” that the likehood of a prior model represents one of the inputs to the next. It gets further complicated because I need any input data to be within a mutable data container so I can make predictions. It’s a bit complicated since I know that BART is fussy about data containers.

# Model 1 features and target
m1_features = ["a", "b"]
m1_target = "c"

# Model 2 features and target
m2_zscore_features = ["c", "d"]
m2_target = "e"

# Model 3 features and target
m3_features = "e"
m3_target = "f"

coords = {"features" : m1_features, # For model 1
          "x"        : train[m3_features].reset_index().index.tolist()} # for GP

with pm.Model(coords=coords) as m1:

    # Define data
    m1X = pm.MutableData("m1X", train[["a", "b"]].values)
    m1y = train["c"].values
    
    m2X = pm.MutableData("m2X", train[["c", "d"]].values)
    m2y = train["e"].to_numpy()
    
    m3X = pm.MutableData("m3X", train["e"].to_numpy().reshape(-1,1))
    m3y = train["f"].to_numpy()

    # --------- Model 1 ---------- #
    
    # Define weights
    beta = pm.StudentT("beta", mu=0, sigma=10, nu=7, dims="features")

    # Define intercept
    alpha = pm.StudentT("alpha", mu=0, sigma=10, nu=7)

    # Define linear model
    lm = alpha + (m1X*beta).sum(axis=-1)

    # Define model error
    err1 = pm.Exponential("err1",0.5)
    
    # Define likelihood for model 1
    like1 = pm.StudentT("like1", mu=lm, sigma=err1, nu=7, observed=m1y, shape=m1X.shape[0])
    
    # --------- Model 2 ---------- #

    # mu
    mu = pmb.BART("mu",X=m2X, Y=np.log(m2y+0.01), m=100)

    # Define model error
    err2 = pm.Exponential("err2",0.5)
    
    # Define likelihood for model 2
    like2 = pm.Normal("like2", mu=pm.math.exp(mu), sigma=err2, observed=m2y, shape=mu.shape[0])


    # --------- Model 3 ---------- #

    # Set prior on GP hyperparameters
    eta = pm.Exponential("eta", lam=1)
    ell = pm.InverseGamma("ell", alpha=2, beta=20)

    # Specify covariance function and GP
    cov = eta**2 * pm.gp.cov.Matern52(1, ls=ell)
    
    # HSGP
    gp = pm.gp.HSGP(m=[20], c=3, cov_func=cov)

    # Function prior
    ff = gp.prior("ff", X=m3X, dims="x")

    # Model err
    err3 = pm.HalfNormal("err3", sigma=3)
    like3 = pm.Normal("like3", mu=ff, sigma = err3, observed=m3y)

My questions are:

How do I pass the estimate of a model as an input to the next model?
How do I manage the data containers in a way that allows all the values that represent predictors can be set with set_data(). I know both BART and HSGPs allow this for prediction so it seems like it should be possible.
Third, and less important for this question, do the inputs to GPs have to be ordered? It makes sense when the X axis represents time or another sequential axis, but what about when the space is just a set of values between two points? When I initially modeled the data I did not sort the data by the values that are on the x axis (e in my dataframe). I was having problems with divergences and weird posterior predictions. When I ordered the data it cleared things up…

I saw this related post, but it doesn’t look like it was ever addressed.

I’ve attached a copy of the data to help if needed. Just FYI all the variables are standardized except e, which is normalized between 0 and 1.

RANDOM_SEED = 101
rng = np.random.default_rng(RANDOM_SEED)

# Sample some portion of the data as training
train_idx = data.sample(frac=0.7,random_state=RANDOM_SEED, replace=False).index.tolist()

# Assign to train set
data = data.assign(is_training_set = 0)
data.loc[train_idx, "is_training_set"] = 1

# Split train and test data
train = data.loc[data.is_training_set==1,:]
test  = data.loc[data.is_training_set==0,:]

data.csv (22.7 KB)

jessegrabowski · March 9, 2024, 10:56am

A pet peeve of mine is to name the random variable with observed data likelihood. Yes, it is used to compute the likelihood term in Bayes’ law, but more importantly, it’s a random variable that represents the distribution from which the observed data was drawn. What I mean is that when you do pm.sample_prior_predictive or pm.draw and look at, for example, like1, you will not be looking at a likelihood value. You will be looking at a sample of possible data, from your proposed data generating process (the model).

This also goes to a fundamental difference between PyMC v5 and earlier pymc versions (and other PPLs like STAN). In PyMC, you define the generating process for the data, and PyMC automatically works out the log probability of the data, given that process. You don’t ever directly provide likelihood computations (unless you include a pm.Potential…)

All of this is a long winded way of saying that I would propose you to rename like1 to c_hat or c_pred, then e_hat ~ N(c_hat, d) and f_hat ~ GP(e_hat). You can just pass the RVs along the chain as you expect, and PyMC will work out the joint log probability over all the models for you in the background.

For question 2, I would set all your data once as just X and y, and index into it using the **_features arrays (converted to index values so it’s compatible with pytensor). That would mean you just have to swap out the data once rather than 3 times. As long as the columns of the new data are the same as the old, it will work fine.

Q3 I will leave to an expert like @bwengals

JAB · March 10, 2024, 3:17pm

Hi @jessegrabowski, thanks for the answer. Your comment about likelihood is well received. Despite understanding that I am dealing with a RV, e.g, my question 1 (How do I pass the estimate of a model as an input to the next model?), I definitely had myself confused by naming it likelihood. Definitely something I will be changing in future models.

Your answer makes a lot of sense in theory, but I am having a bit of trouble with the implementation. I know you mentioned putting everything into a single data container and then indexing it, but for now I have them all separate for clarity.

coords = {"x" : train["e"].reset_index().index.tolist()} # for GP

with pm.Model(coords=coords) as m1:

    # Define data
    a = pm.MutableData("a",  train["a"].values)
    b = pm.MutableData("b",  train["b"].values)
    c = pm.ConstantData("c", train["c"].values)
    d = pm.MutableData("d", train["d"].values)
    e = pm.ConstantData("e", train["e"].to_numpy())
    f = train["f"].to_numpy()

    # --------- Model 1 ---------- #
    
    # Define weights
    beta_a = pm.StudentT("beta_a", mu=0, sigma=10, nu=7)
    beta_b = pm.StudentT("beta_b", mu=0, sigma=10, nu=7)

    # Define intercept
    alpha = pm.StudentT("alpha", mu=0, sigma=10, nu=7)

    # Define linear model
    lm = alpha + a*beta_a + b*beta_b

    # Define model error
    err1 = pm.Exponential("err1",0.5)
    
    # Define likelihood for model 1
    c_hat = pm.StudentT("c_hat", mu=lm, sigma=err1, nu=7, observed=c, shape=a.shape)
    
    # --------- Model 2 ---------- #

    # mu
    mu = pmb.BART("mu",X=[c_hat, d], Y=np.log(e+0.01), m=100) # <--- NOT SURE HOW TO CONCATENATE

    # Define model error
    err2 = pm.Exponential("err2",0.5)
    
    # Define likelihood for model 2
    e_hat = pm.Normal("e_hat", mu=pm.math.exp(mu), sigma=err2, observed=e, shape=mu.shape[0])

    # # --------- Model 3 ---------- #

    # Set prior on GP hyperparameters
    eta = pm.Exponential("eta", lam=1)
    ell = pm.InverseGamma("ell", alpha=2, beta=20)

    # Specify covariance function and GP
    cov = eta**2 * pm.gp.cov.Matern52(1, ls=ell)
    
    # HSGP
    gp = pm.gp.HSGP(m=[20], c=3, cov_func=cov)

    # Function prior
    gp_prior = gp.prior("gp_prior", X=e_hat.reshape((-1,1)), dims="x") #<-- GP REQUIRES COLUMNS SO I ASSUME THIS WORKS. IT WORKS WHEN I JUST EVALUATE IT OUTSIDE THE MODEL CONTEXT, E.G, e_hat.reshape((-1,1)).shape.eval()

    # Model err
    err3 = pm.HalfNormal("err3", sigma=3)
    f_hat = pm.Normal("f_hat", mu=gp_prior, sigma = err3, observed=f)

I am having a few issues:

I am not sure how to concatenate the estimate of c—c_hat—and the observed, d, to be passed into BART. I can see how with a basic linear regression I could bypass this by separating the individual model weights. I’ve tried passing it in as a list, concatenating with numpy, and creating a new variable with pm.Data(*), but none seem to work. I can’t find a method in pytensor to do this…

EDIT: I joined the observed data d with the estimate of c (c_hat) using

pt.tensor.stack((c_hat,d)).T

One thing I am thinking about is that when I train the models separately, all the inputs are observed values. However, in this larger model, the next submodel only sees estimated values and never observed. Is there a way to have the model see only observed values and then later use the estimated values? I guess this is just a consequence of making the individual submodels into a single full Bayesian model?
As I mentioned above I found that the input to the GP needed to be ordered. How do I match up the coordinates from observed e, e_hat, used for dims in the HSGP with the values from the observed data? Or do I need to create the coords within the model context between the two sub models? I tried something like this with no luck.

combined.add_coords({"x", pt.tensor.sort(e_hat)})

jessegrabowski · March 14, 2024, 10:14am

You can also do pt.concatenate([c_hat[:, None], d[:, None]], axis=-1)

Small nitpick, pt should alias pytensor.tensor, not pytensor itself

Sure, just replace the _hat variables with the observed data, as you were doing previously. It all depends what you want to achieve, I guess. All the estimated values will be recovered when you do pm.sample_posterior_predictive either way, the question is just whether you want the model uncertainties to propagate.

Coords are just labels, but they don’t affect computation. If you want to re-order your outputs for the computations, you need to creating a mappings from one ordering to the next, basically a numpy array where the i-th entry has an integer value saying where it needs to go. For example if you just want to sort the numbers from biggest to smallest in some vector x, you can index x by x[pt.argsort(x)], same as numpy.

JAB · March 18, 2024, 5:26pm

Thanks @jessegrabowski for all your detailed and helpful answers.

Topic		Replies	Views
Multi-output gaussian processes Questions	13	5461	October 22, 2017
How do I combine two models into one larger model? modeling	11	1068	July 31, 2023
Pymc5 out of sample v5 modeling	9	1232	August 20, 2023
Multiple (uncertain) function observations of the same Gaussian process Questions	15	6525	October 16, 2017
Using multiple datasets to get a single parameter estimation v5 modeling	7	76	January 13, 2025

Combining multiple models into single model for uncertainty propagation

Related topics