Vectorisation of BART for multidimension hierarchical data

I don’t have an answer, but I have a similar question. We now have PyMC-BART, which I’ve been playing with today. I’m also aware of work on hierarchical BART in Hierarchical Embedded Bayesian Additive Regression Trees.

This topic also raises the same question.

I’m trying to adapt the radon example to use BART and a hierarchical specification. Based on the BART examples, I have the below code so far.

data = pd.read_csv(pm.get_data("radon.csv"))
data["log_radon"] = data["log_radon"].astype(np.float64)
county_idx, counties = pd.factorize(data.county)
coords = {"county": counties, "obs_id": np.arange(len(county_idx))}

X = np.vstack((county_idx,data["floor"].to_numpy())).T
Y = data["log_radon"].to_numpy()

with pm.Model(coords=coords, check_bounds=False) as pymc_bart_model:
    # County effects
    mu = pmb.BART("mu", X=X, Y=np.log(Y), m=100, dims=["county", "obs_id"])

    sigma = pm.HalfNormal("sigma", sigma=1.5)
    pm.Normal(
        "log_radon", mu=mu, sigma=sigma, observed=data.log_radon.values, dims="obs_id"
    )

I get a similar error of

ValueError: Size length is incompatible with batched dimensions of parameter 0 mu:
len(size) = 1, len(batched dims mu) = 2. Size length must be 0 or >= 2

I was working off the categorical example Categorical regression — PyMC-BART, which I realize is for the response variable. Any ideas on how/if it is possible to implement a hierarchical structure with PyMC-BART as is?

1 Like