Hedges' g effect size with Pymc3

I’m trying to compare between the grade point average of students in class_A and students in class_B by using Student-t. Class_A has 100 students and class_B has 60; the grades in each classroom have similar means and pretty similar standard deviation.

with pm.Model() as model:
    class_A_mean = pm.Normal('class_A_mean', μ_m, sd=μ_s)
    class_B_mean = pm.Normal('class_B_mean', μ_m, sd=μ_s) 
    
    class_A_std = pm.Uniform('class_A_std', lower=1, upper=30)
    class_B_std = pm.Uniform('class_B_std', lower=1, upper=30)
    
    ν = pm.Exponential('ν_min_one', 1/29.) + 1

with model:
    λ1 = class_A_std**-2
    λ2 = class_B_std**-2 

    group1 = pm.StudentT('group1', nu=ν, mu=class_A_mean, lam=λ1, observed=class_A)
    group2 = pm.StudentT('group2', nu=ν, mu=class_B_mean, lam=λ2, observed=class_B)

Then, to calculate effect size:

with model:

    diff_of_means = pm.Deterministic('difference of means',  class_A_mean - class_B_mean)
    diff_of_stds = pm.Deterministic('difference of stds',  class_A_std - class_B_std)
    effect_size = pm.Deterministic('effect size',
                                   diff_of_means / np.sqrt(
                                   (class_A_std**2 + class_B_std**2) / 2))
    trace = pm.sample(2000, njobs=2)

This works perfectly fine and I get nice results, using Cohen’s d effect_size. However, Cohen’s d should not be used when there are different sample sizes. Since class A has 100 students and class B has 60, I would like to use the Hedges’ g effect size:
image

n_A = len(class_A)
n_B = len(class_B)

with model:

    
    diff_of_means = pm.Deterministic('difference of means',  class_A_mean - class_B_mean)
    diff_of_stds = pm.Deterministic('difference of stds',  class_A_std - class_B_std)
    hedgesG_effect_size = pm.Deterministic('hedgesG effect size',
                                   diff_of_means / np.sqrt(
            (((n_A-1)*(class_A_std**2)) + ((n_B-1)*(class_B_std**2))) / (n_A+n_B-2))) 

But when I use Hedges’ g, I get an error that shuts down my kernel:

error (200): program aborting due to control-C event

Any idea why this is happening with Hedges’ g and not with Cohen’s d effect size?

Changing number of jobs in pm.sample() from 2 to 1 worked without producing error (200). But I wish I understood why jobs=2 works when using a Cohen’s d equation but not when using Hedges’ g.

Seems this is an error related to joblib that it cannot pickle the operation - sometimes these are memory related. I see it from time to time on my mac.

So whenever it might be a memory-related problem, we should try to reduce from jobs=2 to jobs=1?

1 Like