Hello everyone,
I’ve been experimenting with using BART for a classification task and I’m encountering some challenges with the accuracy of my predictions. My issue is that the accuracy score for the test dataset is significantly lower than what I anticipated, especially when I compare it with the results from a knn algorithm I used previously. The accuracy for the training dataset is about 0.8, but the accuracy for the test dataset is only about 0.388, while the KNN can reach to 0.77.
df_train_cleaned = pd.read_csv("cleaned_pipeline.csv")
train = df_train_cleaned
X = train.drop('Credit_Score', axis=1)[:1000]
y = train['Credit_Score'][:1000]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
categories = y_train.unique()
num_categories = len(categories)
num_obs = len(y_train)
with pm.Model() as bart_model:
x = pm.MutableData("x", X_train)
μ = pmb.BART("μ", X=x, Y=y_train, m=100, shape=(num_categories, num_obs))
θ = pm.Deterministic('θ', pm.math.softmax(μ, axis=0))
y_obs = pm.Categorical("y_obs", p=θ.T, observed=y_train)
idata = pm.sample(draws=500, tune=10000, chains=8, cores=8, return_inferencedata=True)
y_train_pred = idata.posterior["θ"].mean(dim=["draw", "chain"]).argmax(dim="θ_dim_0")
accuracy = accuracy_score(y_train, y_train_pred)
print(f"accuracy: {accuracy}")
with bart_model:
pm.set_data({
"x": X_test
})
pm.sample_posterior_predictive(
idata, extend_inferencedata=True, predictions=True
)
# the argmax is specific to this classification problem
yhat = idata.posterior["θ"].mean(dim=["draw", "chain"]).argmax(dim="θ_dim_0")
accuracy = accuracy_score(y_test, yhat)
print(f"accuracy: {accuracy}")
One thing I’ve noticed is that there aren’t any examples in the documentation specifically about using BART for classification tasks, which makes me wonder if I’m missing something crucial in my approach. If anyone has experience with this or can point out any potential missteps in my code, that would be immensely helpful.
I’m aware that using categoricals and a softmax function for a binary outcome might seem like overkill, but my plan is to extend this project to handle a multiple category dataset in the future, which is why I’ve set it up this way. Still, I’m puzzled by the low accuracy score in its current state.
Another question I have is regarding the recovery of probabilities for the prediction dataset. I’m not entirely sure how to approach this with BART, so any advice or examples on this would be greatly appreciated.
Looking forward to any suggestions or insights you all might have. Thanks in advance for your help!