Joint Linear Hypothesis - Bayesian Version

Hello everyone!

I have a modeling problem and I cannot find a solution to it.

Let’s start with an oversimplified example dataset that illustrates the problem. Let’s say we are measuring test scores from students in two different schools, some of these students have received 1-on-1 tutoring (“treatment”) and some haven’t:


I want to determine if tutoring affects test scores in either school while allowing the effect size of tutoring to differ between schools. Without going into too much frequentist stuff this is usually referred to as a “joint test” or an “F-test” where the null hypothesis is that the treatment effect in either school is equal to 0. See statsmodels implementation here. More info here.

A similar Bayesian equivalent is a hierarchical model allowing for pooling of the treatment effect across schools (sorry, I don’t know how to post LaTeX on this forum):

Subset school A and school B into two datasets
test_scores{A} = alpha{A} + beta{A} * treatment{A}
test_scores{B} = alpha{B} + beta{B} * treatment{B}

Assume the two betas are sampled from the same hierarchical distribution
beta{A,B} ~ Cauchy(beta{Global}, 10)
beta{Global} ~ Cauchy(0, 10)

Where we can look at the HDI for beta_global to determine if there is sufficient evidence for the treatment effect in either school. However, this is where we see the problem:


The effect size of treatment within the two schools is different, different enough that although the treatment HDI within each school does not include 0 the HDI for the hierarchical effect, beta_global, does include 0, probably due to a combination of different effect sizes between schools and small sample size.

However, it is clear that the treatment has an effect and I want a way to measure that effect while negating this problem. Is there a way to formulate a Bayesian “joint test” on beta_a and beta_b that gives a single parameter that measures how different either of these parameters are from 0?*

A naive approach would be to simply take a sample size weighted average of beta_a and beta_b and look at the HDI of that parameter. DBDA looks at differences of parameters all the time, is summing/averaging parameters acceptable?

Thank you in advance!

Code snippet and dataset:
example_data.csv (105 Bytes) (2.9 KB)

*NOTE: while it is theoretically possible that beta_a and beta_b could have opposite effect sizes while both are non-zero this is unlikely to happen in my real world dataset for reasons outside of the scope of this post.

I think the problem is you only have 2 schools, so your global posterior is not strongly conditioned by your data.

What you are suggesting sounds like complete pooling the estimates of the two schools instead of partial pooling like the hierarchical model does. If that’s what you want you can model it like that.

In terms of extracting summary statistics afterwards you can do whatever you want, just keep in mind that your model uncertainty about the global effect may be the plausible thing even if it feels underwhelming.

Thank you for your response!

That’s a good idea, but unfortunately complete pooling is not possible in my real world dataset for domain-specific reasons. To attempt to extend the toy example above the test scores cannot be pooled directly because school A gave a math test and school B gave an English test. The statistic I want is a measure of how much tutoring affects either math or English grades across schools.

It sounds like I have a solution: a weighted sum of the individual school treatment effect parameters. I have already tried this in my real world dataset and it gives some expected positive results, thank you for confirming this is acceptable!