Bambi Beta Regression Inference

Hi

The documentation of bambi has a nice documentation page about beta regression: Beta Regression — Bambi 0.9.0 documentation

However I’m unsure about how to interpret coefficients in a model with more than one coefficient. As the math gets more complicated with more than one the described way in the guide is not suitable anymore (please correct me if I’m wrong). I think the recommended way forward is to use marginal effects of specific variables while holding others constant. Is there some built in functionality in bambi/pymc3 to get to these marginal effects of the coefficients or other avenues to interpret beta regression coefficients (dummym,continuous and discrete)? I mostly find resources in this direction for R and most look rather elaborate (A guide to modeling proportions with Bayesian beta and zero-inflated beta regression models | Andrew Heiss)

That is what I used (in reality its more coefficients)
model = bmb.Model(“”“y_target ~ b_A + b_B + b_C”“”, df, family=“beta”)
model_fitted = model.fit(draws=2000, tune=5000)

The best way to understand marginal effects of a given model, in my opinion, is to work out the partial derivatives.

First you need to know your link function, because this determines the functional form of the estimated mean. Schlepping through the source code, it looks like the default link function for beta regression is the inverse logit function. Per the documentation you linked, Bambi uses the mean-sigma parameterization of the beta distribution, so you’re directly modeling the mean and you don’t have to do anything crazy to recover parameters. Thus, to understand the effect of a change in the value of a covariate on the estimated mean, we just need to solve the following partial derivative:

\frac{\partial \hat \mu_i}{\partial x_j} = \frac{1}{1 + \exp(-X_i\beta)} = \beta_j\frac{\exp (-X_i\beta)}{(1 + \exp(-X_i\beta))^2} =\beta_j \hat \mu_i^2 \exp(-X_i\beta)

Where X_i is a 1 \times k row vector of covariate values (perhaps the i-th row of the design matrix, hence my choice of nomenclature) and \beta is a k \times 1 column vector of parameters, with \beta_j, j \leq k as the coefficient associated with the j-th covariate.

Some observations:

  1. The marginal effect is a function of the the covariates themselves, both through the estimated average \mu, and in the exponent term. To get a single number, you have to plug in values for X to evaluate this expression. This could be a particular unit of interest in your study, or you could plug in the sample averages to get the so-called “average marginal effect” (this is what e.g. stata gives you with the margins, dydx post-estimation command (ok boomer))

  2. Because we are working with logits, all the interpretation baked into logistic regression is available to us, if we want it. For example, one could interpret the above equation as marginal effects on the probability scale. I.e., given values for X_i, a unit change in x_j is associated with a such-and-such percent change in the probability of observing an outcome value of \hat \mu_i = 1.

  3. Likewise, you could use the log-odds interpretation of the parameters themselves, by supposing that \hat \mu_i = \frac{1}{1 + \exp (-X_i \beta)} = \frac{\exp(X_i \beta)}{1 + \exp (X_i \beta)} = p_i, so that \ln \left ( \frac{p_i}{1 - p_i} \right ) = X_i \beta (p_i is the probability that \hat y_i = 1). In this interpretation, you can directly read off the coefficients, then go straight to wikipedia and read about log-odds for a half hour because you once again forgot how to interpret them (perhaps I’m projecting).

If your regression models a probability, the interpretations borrowed from logistic regression are quite appealing. If you are modeling something else, you will have to be more careful in your reasoning. In either case, staring at the partial derivative is a great place to start. I suggest making a lot of plots and experimenting with changes to the parameters. Desmos is a nice tool for doing this interactively.

1 Like

Thanks for your reply, jessegrabowski

The link would be logit.

As there is no single marginal effect as it depends where on the slope we look at. Thus I would need to consider the other coefficients as well. I currently use a very naive approach (where don’t consider other coefficients bar the intercept) just taking

mean_effect = model_fitted.posterior.data_vars[coef].mean().item()
naive_marginal_effect = expit(intercept + 1*mean_effect) - expit(intercept) #for dummy

Multiple ways to go about it with pros and cons (Econometrics - Marginal Effects for Probit and Logit (and Marginal Effects in R) - YouTube)
-Calculate each observations marginal effect
-Marginal Effect of a Representative: Pick some set of of variables and calculate the marginal effect
-Average Marginal Effect (AME): Calculate each observations marginal effect and take the mean
-Marginal Effect at the Mean (MEM): Calculate the average of each variable (like 25% blonde), then get the marginal effect for some hypothetical observation with those mean values.

I’d prefer AME but I fail to program it correctly let alone efficiently in python. That is why I am asking if there was some resources where it i described to how to do it cleanly in a bambi/pymc3/python way like in the article by Andrew Heiss for R, I linked in the original post.

I wouldn’t set all the other coefficients to zero, since that doesn’t produce the correct marginal effect for your model.

Where are you stumbling with computing the AME? Are you able to compute the row-wise expected mean, with something like mu_hat = expit(np.einsum('ik, ...k -> ...i', data, estimated_coefficients))? (or this might just be in your posterior idata already?) Once you have this, it seems straight-forward enough to get row-wise marginal effects for a particular parameter following either the equation in the youtube vid, or the one I wrote out above. mu_hat should have shape (chains, draws, n_obs), so computing beta * mu_hat * (1 - mu_hat) will give you the entire posterior distribution over the marginal effect of beta for your whole dataset. From that you can take the mean over the last dimension to get the distribution over AMEs.

To directly answer your question, I’m not aware of any packages that wrap up all this functionality for you. Of course, this doesn’t mean that it doesn’t exist. My personal observation is that the python stats ecosystem seems to have fewer of these types of helper packages, relative to R.

2 Likes

Thanks again for your quick replies. So you were right this is already in the model_fitted object available.

pexpit=model2_fitted.posterior_predictive.P_alive.values #values are already in original form. No expit() necessary (?)
mu_hat=np.concatenate((pexpit[0,:,:],pexpit[1,:,:]),axis=0) #both chains
mu_hat_mean=mu_hat.mean(axis=0) #average over all draws
indexfuncfunc=np.multiply(mu_hat_mean,1-mu_hat_mean).mean(axis=0) #average over all observations

#AME = indexfuncfunc * some_beta

As a sidenote. I get quite different results depending on when I average the observations, meaning if I don’t calculate the mu_hat_mean but calculate the indexfuncfunc with mu_hat and then averaging I get 0.071 instead of 0.191 in my case.