In my original question about ML vs Bayesian models, I was most concerned with cases where we could use a hierarchical structure in the data. In classical statistical parlance that would be cases where we might consider a mixed effects model over a fixed effects model. For me personally, I essentially default to using a Bayesian models whenever possible and ML models otherwise. There are cases where I only care about prediction and will just use ML, but most my work revolves around models where understanding is required.
In my work, I often have cases where a variable can be used to index within the hierarchical structure. I care about explainability mainly in the form of the model structure as well as the feature weights. For example, consider the classic penguins dataset . Let’s say I want to fit a model that predicts body mass from flipper length. The final part of the might look like this:
obs_mean = alpha + beta*df["flipper_length_mm"]
sigma = pm.Exponential("sigma",lam=1)
obs = pm.Normal("likelihood",mu=obs_mean,sigma=sigma, observed=df["body_mass_g"].to_numpy())
Now, I want to account for the sex of the penguin. In an ML model, I can only do this by fitting two individual models or adding sex as another parameter. In a bayesian model, I could do the same OR I could introduce it as a level in a hierarchical structure.
Method 1: Penalty for sex
obs_mean = alpha + beta_sex*df["sex"].map({"MALE": 0, "FEMALE": 1}).values + beta*df["flipper_length_mm"].values
obs = pm.Normal("likelihood",mu=obs_mean,sigma=sigma, observed=df["body_mass_g"].to_numpy())
Method 2: Random intercept and slope for sex
sex_idx = df["sex"].map({"MALE": 0, "FEMALE": 1}).values
obs_mean = alpha[sex_idx] + beta[sex_idx]*df["flipper_length_mm"].values
obs = pm.Normal("likelihood",mu=obs_mean,sigma=sigma, observed=df["body_mass_g"].to_numpy())
Now, depending on the dataset, it is entirely possible that the predictions from these two models are very similar. However, there is a big difference in their interpretation. In the first, we simply assume that there is some global slope and intercept for penguins and that sex acts as a penalty, such as a varying intercept in a random effects model. However, the slope remains the same, therefore the penalty is exactly the same across all flipper lengths–just a linear shift in the line. In the second, we not only allow for variation in that slope and intercept, but we explicitly control how they are varied based on the priors we choose. They could be constrained to only positive, they could be normal around 0 (so positive and negative slope/intercept), they could be heavy-tailed from a student-t…
The Bayesian model allows for extremely complicated model structures if the problem requires. By visualizing the model graph we can see the relationship between variables and the assumptions we make by choosing that structure.
In my non-expert experience, this is how I see a main point of difference between the models. Again, prediction might be the same for the two bayesian models and ML model, however, we can say so much more about our assumptions in the Bayesian models.
There are certain types of explainability that relate to what is happening under the hood, such as using MCMC for a Bayesian model vs gradient descent for a neural network. I am not really concerned with those as it relates to the original question that I posted.
I hope this helps.