Balancing under/overfitting in BART model

jfhawkin · September 4, 2024, 9:32pm

I’m seeking feedback on a model I’ve been refining for a while. I’ll present two recent versions of the model that illustrate current results. The models are estimated on 10% of my data because I run out of memory otherwise. The model includes random effects that vary across spatial units. I initially tried ICAR/CAR spatial correlations but ran into some problems adjusting the code for sparse matrices (I got it fixed but an update in the source code from Github from the version I was using messed it up and I couldn’t find my original code again - all on a remote server). Switching from random effects to ICAR/CAR may help with some of the overfitting/posterior storage issues. I put my main questions at the top so they aren’t lost in the code/figures.

How can I balance between perfectly fitting the observations and missing the observation peak (chatGPT had some suggestion but isn’t perfect ).
Any ideas on if it’ll be worth me getting the ICAR prior working? I think it could help. I just need to go through and adjust some of the PyMC code a bit to handle the input data type. I tried a few things but was still getting errors.
Is it worth it for me to run the model on the full dataset? I’ve tried that with a few versions of the model. It didn’t seem to help with the current problems.

This is the sampling specification.
nutpie.sample(compiled_model, chains=6, tune=5000, draws=2000, target_accept=0.93)

Model 1
with pm.Model() as bart_model:

w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)

sigma_y = pm.HalfCauchy("sigma_y", beta=10)

# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=1.5, shape=len(np.unique(group_id)))
gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=1.0, shape=len(group_id))

# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id] + gamma_individual)

# likelihood of the observed data
y_i = pm.Normal("y_i", mu=y_hat, sigma=sigma_y, observed=y)

Model 2
with pm.Model() as bart_model:

w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)

sigma_y = pm.HalfCauchy("sigma_y", beta=10)

# Hyperpriors for random effects
sigma_group = pm.HalfCauchy('sigma_group', beta=5)
# sigma_individual = pm.HalfCauchy('sigma_individual', beta=5)

# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=sigma_group, shape=len(np.unique(group_id)))
# gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=sigma_individual, shape=len(group_id))

# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id])

# likelihood of the observed data
y_i = pm.StudentT("y_i", nu=5, mu=y_hat, sigma=sigma_y, observed=y)

The rhat are reasonably good on both models (<1.05 for many, most <1.35) and the effective sizes look good to me.

I am taking a log transformation of the dependent variable rather than using a Lognormal distribution. It gives much better results.

Posterior prediction for Model 1 - looks good except it misses the peak in the observed data.

I’ve tried various other priors and model specifications. Model 2 looks to be overfitting…

I run LOO model comparisons. Comparing Model 1 (labeled 18) with a few other models…

Based on the above, I chose model18-m=75 as my comparison model. Model 2 is model26-m=75 in the figure (m=50 is a check for lower tree count).

According to a discussion on the Stan discourse, there isn’t necessarily an issue with positive LOO results because it’s the density not the log probability. When running the az.compare() function, I also get the warning
UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.

az.loo() suggests the results are not good for Model 2.

Computed from 12000 posterior samples and 21604 observations log-likelihood matrix.

         Estimate       SE
elpd_loo 139404.48    85.01
p_loo    42487.30        -

There has been a warning during the calculation. Please check the results.
------


Pareto k diagnostic values:
                         Count   Pct.
(-Inf, 0.70]   (good)       113    0.5%
   (0.70, 1]   (bad)       6228   28.8%
   (1, Inf)   (very bad) 15263   70.6%

Aki Vehtari says:

If p_loo >p>p, then the model is likely to be badly misspecified. If the number of parameters p≪Np≪N, then PPCs are also likely to detect the problem. See for example the Roaches case study. If pp is relatively large compared to the number of observations, say p>N/5p>N/5 (more accurately we should count number of observations influencing each parameter as in hierarchical models some groups may have few observations and other groups many), it is possible that PPCs won’t detect the problem.

Given this is a BART model and has random intercepts, there are a large number of parameters in the model relative to the sample size. According to the above guidance, my PPC may not catch that problem. If I included the full dataset, it could help to address this issue. Is that right?

jfhawkin · September 4, 2024, 9:41pm

I want to use BART because I’m interested in obtaining causal estimates (based on work by Jennifer Hill et al.) for marginal effects from changing key variables in the model. I’ll also run similar models for other sets of data representing different aspects of the underlying research question.

Note that I actually comment out the random effect and estimate group effects only in Model 2

I have other models where I include the random effect and vary whether I set hyperparameters (call them Model 3 and Model 4). I include those models below with their PPC (not as nice as Model 2 and more complex models).

Model 3 - hyperparameters on group intercepts and includes random/individual intercepts.
with pm.Model() as bart_model:

w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)

sigma_y = pm.HalfCauchy("sigma_y", beta=10)

# Hyperpriors for random effects
sigma_group = pm.HalfCauchy('sigma_group', beta=5)
# sigma_individual = pm.HalfCauchy('sigma_individual', beta=5)

# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=sigma_group, shape=len(np.unique(group_id)))
gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=10, shape=len(group_id))

# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id] + gamma_individual)

# likelihood of the observed data
y_i = pm.Normal("y_i", mu=y_hat, sigma=sigma_y, observed=y)

Model 4 - random intercepts but hyperparameters are fixed.
with pm.Model() as bart_model:

w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)

sigma_y = pm.HalfCauchy("sigma_y", beta=10)

# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=2.0, shape=len(np.unique(group_id)))
gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=1.0, shape=len(group_id))

# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id] + gamma_individual)

# likelihood of the observed data
y_i = pm.StudentT("y_i", nu=5, mu=y_hat, sigma=sigma_y, observed=y)

Topic		Replies	Views
Implementing BART with CAR prior version agnostic bart	13	212	July 11, 2024
Demo Notebook for BART in PyMC Sharing	10	2378	May 20, 2022
Categorical BART with Out of Sample Predictions Sharing prediction , bart	1	116	October 11, 2024
Very High Rhat when fitting Pymc-BART model v5 bart	7	992	November 14, 2023
BART Model convergence problem version agnostic bart	2	113	August 1, 2024

Balancing under/overfitting in BART model

Related topics