I am translating a PyMC model to use a BART specification. I have a couple questions about the ability/syntax to replicate model structures using BART. My base model is given below where X is a set of covariates that vary across observations and Z is a set of covariates that vary across groups of observations.
cbsa, mn_cbsa = df.cbsa.factorize()
coords = {"cbsa": mn_cbsa, "cbsa_vars":Z.columns.values, "cbg_vars":X.columns.values}
n_obs = df.shape[0]
with pm.Model(coords=coords) as icar_model:
# Priors
sigma_a = pm.HalfCauchy("sigma_a", 5)
# prior for alpha
alpha = pm.Beta("alpha", alpha=2, beta=2)
# variance parameter of the spatially dependent random effects
tau_spat = pm.Gamma("tau_spat", alpha=1.0, beta=1.0)
phi = pm.CAR("phi", mu=np.zeros(n_obs), tau=tau_spat, alpha=alpha, W=adj_matrix)
# CBSA model
gamma_0 = pm.Normal("gamma_0", mu=0.0, sigma=10.0)
gamma_1 = pm.Normal("gamma_1", mu=0.0, sigma=10.0, dims="cbsa_vars")
# Density model for intercept
mu_a = pm.Deterministic("mu_a", gamma_0 + pt.dot(Z, gamma_1))
# CBSA variation not explained by covariates
epsilon_a = pm.Normal("epsilon_a", mu=0, sigma=1, dims="cbsa")
const = pm.Deterministic("const", mu_a + sigma_a * epsilon_a, dims="cbsa")
# Common slope
beta = pm.Normal("beta", mu=0.0, sigma=10.0, dims="cbg_vars")
# Model error
sigma_y = pm.Uniform("sigma_y", lower=0, upper=100)
# Expected value
y_hat = const[cbsa] + pt.dot(X,beta) + phi
# likelihood of the observed data
y_i = pm.Normal("y_i", mu=y_hat, sigma=sigma_y, observed=y)
I am able to combine X and Z into a single matrix and estimate the model as follows
coords = {"cbg_vars":X.columns.values}
n_obs = df.shape[0]
rng = np.random.default_rng(42)
with pm.Model(coords=coords) as bart_model:
# Priors
sigma_a = pm.HalfCauchy("sigma_a", 5)
# prior for alpha
alpha = pm.Beta("alpha", alpha=2, beta=2)
# variance parameter of the spatially dependent random effects
tau_spat = pm.Gamma("tau_spat", alpha=1.0, beta=1.0)
phi = pm.CAR("phi", mu=np.zeros(n_obs), tau=tau_spat, alpha=alpha, W=adj_matrix)
w = pmb.BART("w_cbg", X=X, Y=y, m=100)
# Model error
sigma_y = pm.Uniform("sigma_y", lower=0, upper=100)
# Expected value
y_hat = w + phi
# likelihood of the observed data
y_i = pm.Normal("y_i", mu=y_hat, sigma=sigma_y, observed=y)
Question 1: Can heteroskedasticity be applied to this model? Modeling Heteroscedasticity with BART — PyMC-BART
The example appears to have 1 covariate and specifying a shape of (2,n_obs) allows for the specification of a variance term. However, I cannot make it work with multiple covariates (tried both (2,n_obs) and (n_variables+1,n_obs). I get parallelsampling and parametervalue errors.
Question 2. Does it matter that I had to change the group-level variables into individual variables in a BART model? The variable values will still be the same across observations within the same group. The difference in a standard Bayesian model would be the number of random parameters would differ for epsilon_a = pm.Normal("epsilon_a", mu=0, sigma=1, dims="cbsa")
.
I came across some other examples at Cohort Retention Analysis with BART - Dr. Juan Camilo Orduz, which use a MutableData syntax and some other features. Would those features help with any of these questions?