I’m seeking feedback on a model I’ve been refining for a while. I’ll present two recent versions of the model that illustrate current results. The models are estimated on 10% of my data because I run out of memory otherwise. The model includes random effects that vary across spatial units. I initially tried ICAR/CAR spatial correlations but ran into some problems adjusting the code for sparse matrices (I got it fixed but an update in the source code from Github from the version I was using messed it up and I couldn’t find my original code again - all on a remote server). Switching from random effects to ICAR/CAR may help with some of the overfitting/posterior storage issues. I put my main questions at the top so they aren’t lost in the code/figures.

- How can I balance between perfectly fitting the observations and missing the observation peak (chatGPT had some suggestion but isn’t perfect ).
- Any ideas on if it’ll be worth me getting the ICAR prior working? I think it could help. I just need to go through and adjust some of the PyMC code a bit to handle the input data type. I tried a few things but was still getting errors.
- Is it worth it for me to run the model on the full dataset? I’ve tried that with a few versions of the model. It didn’t seem to help with the current problems.

This is the sampling specification.

`nutpie.sample(compiled_model, chains=6, tune=5000, draws=2000, target_accept=0.93)`

**Model 1**

with pm.Model() as bart_model:

```
w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)
sigma_y = pm.HalfCauchy("sigma_y", beta=10)
# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=1.5, shape=len(np.unique(group_id)))
gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=1.0, shape=len(group_id))
# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id] + gamma_individual)
# likelihood of the observed data
y_i = pm.Normal("y_i", mu=y_hat, sigma=sigma_y, observed=y)
```

**Model 2**

with pm.Model() as bart_model:

```
w_mean = pmb.BART("w_mean", X=X, Y=y, m=75)
sigma_y = pm.HalfCauchy("sigma_y", beta=10)
# Hyperpriors for random effects
sigma_group = pm.HalfCauchy('sigma_group', beta=5)
# sigma_individual = pm.HalfCauchy('sigma_individual', beta=5)
# Random intercepts
gamma_group = pm.Normal('gamma_group', mu=0, sigma=sigma_group, shape=len(np.unique(group_id)))
# gamma_individual = pm.Normal('gamma_individual', mu=0, sigma=sigma_individual, shape=len(group_id))
# Expected value
y_hat = pm.Deterministic('y_hat', w_mean + gamma_group[group_id])
# likelihood of the observed data
y_i = pm.StudentT("y_i", nu=5, mu=y_hat, sigma=sigma_y, observed=y)
```

The rhat are reasonably good on both models (<1.05 for many, most <1.35) and the effective sizes look good to me.

I am taking a log transformation of the dependent variable rather than using a Lognormal distribution. It gives much better results.

**Posterior prediction for Model 1** - looks good except it misses the peak in the observed data.

I’ve tried various other priors and model specifications. **Model 2** looks to be overfitting…

I run LOO model comparisons. Comparing Model 1 (labeled 18) with a few other models…

Based on the above, I chose model18-m=75 as my comparison model. **Model 2** is model26-m=75 in the figure (m=50 is a check for lower tree count).

According to a discussion on the Stan discourse, there isn’t necessarily an issue with positive LOO results because it’s the density not the log probability. When running the az.compare() function, I also get the warning

`UserWarning: Estimated shape parameter of Pareto distribution is greater than 0.70 for one or more samples. You should consider using a more robust model, this is because importance sampling is less likely to work well if the marginal posterior and LOO posterior are very different. This is more likely to happen with a non-robust model and highly influential observations.`

az.loo() suggests the results are not good for **Model 2**.

```
Computed from 12000 posterior samples and 21604 observations log-likelihood matrix.
Estimate SE
elpd_loo 139404.48 85.01
p_loo 42487.30 -
There has been a warning during the calculation. Please check the results.
------
Pareto k diagnostic values:
Count Pct.
(-Inf, 0.70] (good) 113 0.5%
(0.70, 1] (bad) 6228 28.8%
(1, Inf) (very bad) 15263 70.6%
```

If p_loo >p>p, then the model is likely to be badly misspecified. If the number of parameters p≪Np≪N, then PPCs are also likely to detect the problem. See for example the Roaches case study. If pp is relatively large compared to the number of observations, say p>N/5p>N/5 (more accurately we should count number of observations influencing each parameter as in hierarchical models some groups may have few observations and other groups many), it is possible that PPCs won’t detect the problem.

Given this is a BART model and has random intercepts, there are a large number of parameters in the model relative to the sample size. According to the above guidance, my PPC may not catch that problem. If I included the full dataset, it could help to address this issue. Is that right?