I’m trying to compare between the grade point average of students in class_A and students in class_B by using Student-t. Class_A has 100 students and class_B has 60; the grades in each classroom have similar means and pretty similar standard deviation.
with pm.Model() as model:
class_A_mean = pm.Normal('class_A_mean', μ_m, sd=μ_s)
class_B_mean = pm.Normal('class_B_mean', μ_m, sd=μ_s)
class_A_std = pm.Uniform('class_A_std', lower=1, upper=30)
class_B_std = pm.Uniform('class_B_std', lower=1, upper=30)
ν = pm.Exponential('ν_min_one', 1/29.) + 1
with model:
λ1 = class_A_std**-2
λ2 = class_B_std**-2
group1 = pm.StudentT('group1', nu=ν, mu=class_A_mean, lam=λ1, observed=class_A)
group2 = pm.StudentT('group2', nu=ν, mu=class_B_mean, lam=λ2, observed=class_B)
Then, to calculate effect size:
with model:
diff_of_means = pm.Deterministic('difference of means', class_A_mean - class_B_mean)
diff_of_stds = pm.Deterministic('difference of stds', class_A_std - class_B_std)
effect_size = pm.Deterministic('effect size',
diff_of_means / np.sqrt(
(class_A_std**2 + class_B_std**2) / 2))
trace = pm.sample(2000, njobs=2)
This works perfectly fine and I get nice results, using Cohen’s d effect_size. However, Cohen’s d should not be used when there are different sample sizes. Since class A has 100 students and class B has 60, I would like to use the Hedges’ g effect size:
n_A = len(class_A)
n_B = len(class_B)
with model:
diff_of_means = pm.Deterministic('difference of means', class_A_mean - class_B_mean)
diff_of_stds = pm.Deterministic('difference of stds', class_A_std - class_B_std)
hedgesG_effect_size = pm.Deterministic('hedgesG effect size',
diff_of_means / np.sqrt(
(((n_A-1)*(class_A_std**2)) + ((n_B-1)*(class_B_std**2))) / (n_A+n_B-2)))
But when I use Hedges’ g, I get an error that shuts down my kernel:
error (200): program aborting due to control-C event
Any idea why this is happening with Hedges’ g and not with Cohen’s d effect size?