I have created 4 bivariate linear regressions, where the outcome is the average grade in English Language Arts, based on 4 separate predictors. Each marker/dot represents a school. Here’s the scatter plot.

Here are the Slope (β) for each model when predictor is:
Chronic absence: -0.26
Hispanic-Black: -0.28
Economic Need: -0.30
Math Proficiency: 0.35

Standard Deviation (σ) of the mean when predictor is:
Chronic Absence: 0.26
Hispanic-Black: 0.23
Economic Need: 0.21
Math Proficiency: 0.12

Can I get help with the analysis of these results?

Am I correct to assume that Math Proficiency is the strongest predictor of ELA Proficiency because its slope is furthest away from zero?

Why does the standard deviation decrease from model to model. Does that signify more certainty around the mean of ELA proficiency or more certainty around the predictor?

The 97% HPDI for ELA predictions (orange), where the predictor is math proficiency, is the narrowest, compared to all other models. Does that signify that it is the strongest predictor?

Is there anything else that I can learn from the visual comparison of these models that I’m missing?

(1) In a way, yes, the slope is the global linear rate of change, so you could say “strongest predictor”. However, none of the predictors alone can give you the whole story, and they might be correlated. Thus it is hard to isolate a predictive effect - I would tend to report all effects and avoid judgement.
How much something predicts depends also on the distribution of values and the cross-effect to other variables (correlation of the values and even of their slopes). This holds true even for standardized values as yours. For illustration: let’s pretend the “Percent Black/Hispanic” plot would have the strongest slope. Most of the values cluster at >0.5. Even the high slope would not help you to predict big differences between most of the schools in your data set.

So it depends on which question is asked. For me, the question of “what is the strongest predictor” does not capture the essence of the problem (multiple predictors). But it can be done, see below.

(2) I would interpret the σ here not as standard deviation, but as a model residual (usually denoted as epsilon). It is not a standard deviation (std’s scale with the number of observations, this one shouldn’t). Don’t mix it with the “range of plausible true values” of your predictor slopes β (95% hpd interval, your green area), and remember that the intercept also has a 95% hpdi which adds on top of that green area.
A lower model residual means that the variable x explains more of the variation in y; yet someone else might have a clearer formulation.

(3) I guess you got the orange intervals from ppc sampling? They indicate how your model fits the data and where prediction is insufficient (outliers). Outliers are okay, since orange covers only 95% of the data, but they should be equally distributed over the parameter range (showing, for example, that in “chronically abscent” predictor, a linear model is maybe a bit coarse, but still okay).

(4) you can learn that the models fit well, but that there were outlier. You also learn about the distributions of your data. So the plots are very informative and well polished.

You could try to combine the models and add beta’s for all your predictors in the same model, possibly even as a multivariate structure to account for correlations (see here). That makes it harder to interpret, but the model should be more accurate.
To estimate which is the strongest predictor, I would then have four extra models where one of all the slopes is left out in turn. Then you can do model comparison / LOO, to compare the predictive power of the reduced models with the full one.

I know you accepted @Falk’s excellent answer already, but here is my 2 cents.

Am I correct to assume that Math Proficiency is the strongest predictor of ELA Proficiency because its slope is furthest away from zero?

No, not in general, because your predictors are not standardized. If I really wanted to know “which is the strongest” predictor here, I would do the following:

Prior to building any models, I would visualize the relationships between predictors. Perhaps some of them are highly correlated.

Standardize each predictor (i.e., so it has unit standard deviation). This will allow easy comparison of β values across predictors.

I would also standardize the outcome measure so that βs are interpretable as effect sizes.

Fit multiple models reflecting the power set of combinations of predictors (also, don’t forget about interactions).

Evaluate the goodness of fit of all models (after ensuring convergence) using relative indices such as WAIC, LOO-CV, etc., as well as absolute goodness using PPC.

Based on careful consideration of all of the results above, come to a conclusion about which predictors are useful and which are redundant, if any.

Thank you, @sammosummo
I thought I did standardize the predictors. Anyhow, I will definitely fit multiple models to reflect the power set of combinations. I’m learning to do that right now with Richard McElreath’s book on Statistical Rethinking. He also talks about LOO in his subsequent chapter.