I’m trying to compare between the grade point average of students in class_A and students in class_B by using Student-t. Class_A has 100 students and class_B has 60; the grades in each classroom have similar means and pretty similar standard deviation.
with pm.Model() as model: class_A_mean = pm.Normal('class_A_mean', μ_m, sd=μ_s) class_B_mean = pm.Normal('class_B_mean', μ_m, sd=μ_s) class_A_std = pm.Uniform('class_A_std', lower=1, upper=30) class_B_std = pm.Uniform('class_B_std', lower=1, upper=30) ν = pm.Exponential('ν_min_one', 1/29.) + 1 with model: λ1 = class_A_std**-2 λ2 = class_B_std**-2 group1 = pm.StudentT('group1', nu=ν, mu=class_A_mean, lam=λ1, observed=class_A) group2 = pm.StudentT('group2', nu=ν, mu=class_B_mean, lam=λ2, observed=class_B)
Then, to calculate effect size:
with model: diff_of_means = pm.Deterministic('difference of means', class_A_mean - class_B_mean) diff_of_stds = pm.Deterministic('difference of stds', class_A_std - class_B_std) effect_size = pm.Deterministic('effect size', diff_of_means / np.sqrt( (class_A_std**2 + class_B_std**2) / 2)) trace = pm.sample(2000, njobs=2)
This works perfectly fine and I get nice results, using Cohen’s d effect_size. However, Cohen’s d should not be used when there are different sample sizes. Since class A has 100 students and class B has 60, I would like to use the Hedges’ g effect size:
n_A = len(class_A) n_B = len(class_B) with model: diff_of_means = pm.Deterministic('difference of means', class_A_mean - class_B_mean) diff_of_stds = pm.Deterministic('difference of stds', class_A_std - class_B_std) hedgesG_effect_size = pm.Deterministic('hedgesG effect size', diff_of_means / np.sqrt( (((n_A-1)*(class_A_std**2)) + ((n_B-1)*(class_B_std**2))) / (n_A+n_B-2)))
But when I use Hedges’ g, I get an error that shuts down my kernel:
error (200): program aborting due to control-C event
Any idea why this is happening with Hedges’ g and not with Cohen’s d effect size?