Hello everyone!
I have a modeling problem and I cannot find a solution to it.
Let’s start with an oversimplified example dataset that illustrates the problem. Let’s say we are measuring test scores from students in two different schools, some of these students have received 1-on-1 tutoring (“treatment”) and some haven’t:
test_score,treatment,school
1,0,A
2,0,A
21,1,A
22,1,A
2,0,B
3,0,B
3,0,B
10,1,B
11,1,B
12,1,B
I want to determine if tutoring affects test scores in either school while allowing the effect size of tutoring to differ between schools. Without going into too much frequentist stuff this is usually referred to as a “joint test” or an “F-test” where the null hypothesis is that the treatment effect in either school is equal to 0. See statsmodels implementation here. More info here.
A similar Bayesian equivalent is a hierarchical model allowing for pooling of the treatment effect across schools (sorry, I don’t know how to post LaTeX on this forum):
Subset school A and school B into two datasets
test_scores{A} = alpha{A} + beta{A} * treatment{A}
test_scores{B} = alpha{B} + beta{B} * treatment{B}
Assume the two betas are sampled from the same hierarchical distribution
beta{A,B} ~ Cauchy(beta{Global}, 10)
beta{Global} ~ Cauchy(0, 10)
Where we can look at the HDI for beta_global to determine if there is sufficient evidence for the treatment effect in either school. However, this is where we see the problem:
The effect size of treatment within the two schools is different, different enough that although the treatment HDI within each school does not include 0 the HDI for the hierarchical effect, beta_global, does include 0, probably due to a combination of different effect sizes between schools and small sample size.
However, it is clear that the treatment has an effect and I want a way to measure that effect while negating this problem. Is there a way to formulate a Bayesian “joint test” on beta_a and beta_b that gives a single parameter that measures how different either of these parameters are from 0?*
A naive approach would be to simply take a sample size weighted average of beta_a and beta_b and look at the HDI of that parameter. DBDA looks at differences of parameters all the time, is summing/averaging parameters acceptable?
Thank you in advance!
Code snippet and dataset:
example_data.csv (105 Bytes)
analyze_example_data.py (2.9 KB)
*NOTE: while it is theoretically possible that beta_a and beta_b could have opposite effect sizes while both are non-zero this is unlikely to happen in my real world dataset for reasons outside of the scope of this post.