Out of sample predict issue

Number_Huang · June 20, 2023, 4:55am

the following code is from the bart-bikling example,I changed it to a classifier.but it reports the shape mismatch issue,seems that the set_data(X_test) not work.

from pathlib import Path
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pymc_bart as pmb
from sklearn.model_selection import train_test_split
bikes = pd.read_csv(pm.get_data("bikes.csv"))
features = ["hour", "temperature", "humidity", "workingday"]
X = bikes[features]
Y = bikes["count"]
Y2 = Y.apply(lambda x:1 if x>180 else 0)
RANDOM_SEED=100
X_train, X_test, Y_train, Y_test = train_test_split(X, Y2, test_size=0.2, random_state=RANDOM_SEED)
with pm.Model() as model_oos_regression:
    X1 = pm.MutableData("X", X_train.values)
    Y1 = Y_train.values.flatten()
    #α = pm.Exponential("α", 1)
    μ = pmb.BART("μ", X1, Y1)
    #y = pm.NegativeBinomial("y", mu=pm.math.exp(μ), alpha=α, observed=Y, shape=μ.shape)
    #y = pm.Deterministic("y", pm.invlogit(μ))
    pm.Bernoulli("y",observed=Y1,p=pm.Deterministic("p1", pm.invlogit(μ)))
    idata = pm.sample(random_seed=RANDOM_SEED)
    #idata_oos_regression = pm.fit(method=pm.ADVI()).sample()

    #predict out sample
    pm.set_data({"X":X_test.values})
    # posterior_predictive_oos_regression_test = pm.sample_posterior_predictive(
    #     trace=idata_oos_regression, random_seed=RANDOM_SEED,
    #     var_names=['y'],
    #     return_inferencedata=True,
    #     predictions=True
    # )
    idata.extend(pm.sample_posterior_predictive(idata))
    #pred = posterior_predictive_oos_regression_test.predictions
    yHat = idata.posterior_predictive['y'].mean(("chain", "draw")).to_numpy()
    print(f"yHat-len={len(yHat)},X_test-len={len(X_test)}")
    assert len(yHat)==len(X_test)

ricardoV94 · June 20, 2023, 5:49am

You have to specify how the shape of y depends on its parameters. It’s illustrated in the examples here: pymc.set_data — PyMC 5.5.0 documentation

Otherwise you need to provide dummy values for y with the correct shape

Number_Huang · June 20, 2023, 6:11am

thank u Richard, not clear still.
pm.Bernoulli(“y”,observed=Y1,p=pm.Deterministic(“p1”, pm.invlogit(μ)),shape=Y1.shape)
that is what I changed,failed still.
in the document of set_data,the x and y has the same shape,but it is not my case.

ricardoV94 · June 20, 2023, 6:42am

The shape should depend on μ somehow? Otherwise shape=Y1.shape is the default anyway. If you have no other source of shape information other than the observations, you will need to use dummy variables when doing posterior predictive, to force the right shape

Number_Huang · June 20, 2023, 7:05am

yes,μ.shape works!! thank u ricardo.
Another question is
pm.Bernoulli(“y”,observed=Y1,p=pm.Deterministic(“p1”, pm.invlogit(μ)),shape=μ.shape )
for prediction, which variable shall I predict? “y” or “p1”?
my understanding is p1 is invlogit(μ), that is the sigmoid value,which should be a classifier’s predict_proba, but what’s the ‘y’?

ricardoV94 · June 20, 2023, 7:07am

y is a bernoulli draw from p1. You can predict whichever is more useful for you (or both)

ricardoV94 · June 20, 2023, 7:08am

Linking to the new entry in the FAQ for future readers: Frequently Asked Questions - #18 by ricardoV94

Topic		Replies	Views
BART out of sample with adding lines to pandas df v5 bart	0	544	March 17, 2023
Multi-class BART Model Assistance v5 modeling	8	1192	July 6, 2022
Categorical BART with Out of Sample Predictions Sharing prediction , bart	1	110	October 11, 2024
Shape error when making out-of-sample predictions version agnostic shape_issue , prediction	13	391	January 10, 2024
Is it a shape error by BART or my lack of knowledge? v5 modeling	6	1221	January 26, 2023

Out of sample predict issue

Related topics