Why were the observed values in the out-of-sample prediction the true values of the training set, rather than the true values of the test set?

When I study the Out-of-Sample Predictions at Biking with BART case, I found a question about the posterior predictive check in train and test sets.
Why are the observed values ​​of the posterior predictions of the training set and the test set both the true values ​​of the training samples?
In other machine learning methods, we usually use the model built based on the training set to predict the test set. In the out-of-sample prediction of pymc, it seems that the test set is used to build the model and predict the training samples.

The code as follows:

# modeling and train sets prediction
with pm.Model() as model_oos_regression:
    X = pm.MutableData("X", X_train)
    Y = Y_train
    α = pm.Exponential("α", 1)
    μ = pmb.BART("μ", X, np.log(Y))
    y = pm.NegativeBinomial("y", mu=pm.math.exp(μ), alpha=α, observed=Y, shape=μ.shape)
    idata_oos_regression = pm.sample(random_seed=RANDOM_SEED)
    posterior_predictive_oos_regression_train = pm.sample_posterior_predictive(
        trace=idata_oos_regression, random_seed=RANDOM_SEED
# test sets prediction
with model_oos_regression:
    posterior_predictive_oos_regression_test = pm.sample_posterior_predictive(
        trace=idata_oos_regression, random_seed=RANDOM_SEED

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(8, 7), sharex=True, sharey=True, layout="constrained"
# compare the posterior predictive distribution with the observed data
    data=posterior_predictive_oos_regression_train, kind="cumulative", observed_rug=True, ax=ax[0]
ax[0].set(title="Posterior Predictive Check (train)", xlim=(0, 1_000))

    data=posterior_predictive_oos_regression_test, kind="cumulative", observed_rug=True, ax=ax[1]
ax[1].set(title="Posterior Predictive Check (test)", xlim=(0, 1_000));


If I changed the code as follows, it dose not work. Maybe the pymc-bart is not support the pm.MuatableData for Y_data.
with pm.Model() as model:
    X = pm.MutableData("X", X_train)
    Y = pm.MutableData("Y", Y_train)
    α = pm.Exponential("α", 1)
    μ = pmb.BART("μ", X, np.log(Y))
    y = pm.NegativeBinomial("y", mu=pm.math.exp(μ), alpha=α, observed=Y)
    trace = pm.sample(random_seed=RANDOM_SEED)
    posterior_predictive_train = pm.sample_posterior_predictive(trace, random_seed=RANDOM_SEED)

    pm.set_data({"X": X_test, "Y": Y_test})
    posterior_predictive_test = pm.sample_posterior_predictive(trace, random_seed=RANDOM_SEED)


TypeError                                 Traceback (most recent call last)
Hi @JJ_Py

The value of “Y” that you pass to pmb.BART, is only used to compute an initialization value for the leaf_nodes (then PyMC-BART will try to find better ones during tuning). It does not need to be Y, but it happens that passing Y or a transformation like log(Y) is a good starting point. So you can do:

with pm.Model() as model:
    X = pm.MutableData("X", X_train)
    Y = pm.MutableData("Y", Y_train)
    α = pm.Exponential("α", 1)
    μ = pmb.BART("μ", X, np.log(Y_train))

Regarding the comparison probably that was just a matter of convenience when writing the example. @juanitorduz?

Notice if you can also do this to get the the comparison you want.

fig, ax = plt.subplots(
    nrows=2, ncols=1, figsize=(8, 7), sharex=True, sharey=True, layout="constrained"

    data=posterior_predictive_oos_regression_train, kind="cumulative", observed=False,  num_pp_samples=100, ax=ax[0]
ax[0].ecdf(Y_train, label="observed", color="k")
ax[0].set(title="Posterior Predictive Check (train)", xlim=(0, 1_000))

    data=posterior_predictive_oos_regression_test, kind="cumulative", observed=False,  num_pp_samples=100, ax=ax[1]
ax[1].ecdf(Y_test, label="observed", color="k")
ax[1].set(title="Posterior Predictive Check (test)", xlim=(0, 1_000));

Thank you very much. And I found that: in the training set, the empirical cumulative distribution plots obtained by ax.ecdf() and az.plot_ppc() are almost the same; in the test set, there are small differences.

@JJ_Py Sorry for the late reply. As mentioned by @aloctavodia the Y_train is used for the variance. Note that the observed (likelihood) term just depends on the shape of mu and, therefore, the shape of X_test. To avoid confusion I like to work with. coordinates to make this even clearer. See, for example, this model Cohort Retention Analysis with BART - Dr. Juan Camilo Orduz where I do out-of-sample predictions of a BART model using coordinates.