Hedges' g effect size with Pymc3

adam · April 27, 2018, 8:28pm

I’m trying to compare between the grade point average of students in class_A and students in class_B by using Student-t. Class_A has 100 students and class_B has 60; the grades in each classroom have similar means and pretty similar standard deviation.

with pm.Model() as model:
    class_A_mean = pm.Normal('class_A_mean', μ_m, sd=μ_s)
    class_B_mean = pm.Normal('class_B_mean', μ_m, sd=μ_s) 
    
    class_A_std = pm.Uniform('class_A_std', lower=1, upper=30)
    class_B_std = pm.Uniform('class_B_std', lower=1, upper=30)
    
    ν = pm.Exponential('ν_min_one', 1/29.) + 1

with model:
    λ1 = class_A_std**-2
    λ2 = class_B_std**-2 

    group1 = pm.StudentT('group1', nu=ν, mu=class_A_mean, lam=λ1, observed=class_A)
    group2 = pm.StudentT('group2', nu=ν, mu=class_B_mean, lam=λ2, observed=class_B)

Then, to calculate effect size:

with model:

    diff_of_means = pm.Deterministic('difference of means',  class_A_mean - class_B_mean)
    diff_of_stds = pm.Deterministic('difference of stds',  class_A_std - class_B_std)
    effect_size = pm.Deterministic('effect size',
                                   diff_of_means / np.sqrt(
                                   (class_A_std**2 + class_B_std**2) / 2))
    trace = pm.sample(2000, njobs=2)

This works perfectly fine and I get nice results, using Cohen’s d effect_size. However, Cohen’s d should not be used when there are different sample sizes. Since class A has 100 students and class B has 60, I would like to use the Hedges’ g effect size:

n_A = len(class_A)
n_B = len(class_B)

with model:

    
    diff_of_means = pm.Deterministic('difference of means',  class_A_mean - class_B_mean)
    diff_of_stds = pm.Deterministic('difference of stds',  class_A_std - class_B_std)
    hedgesG_effect_size = pm.Deterministic('hedgesG effect size',
                                   diff_of_means / np.sqrt(
            (((n_A-1)*(class_A_std**2)) + ((n_B-1)*(class_B_std**2))) / (n_A+n_B-2)))

But when I use Hedges’ g, I get an error that shuts down my kernel:

error (200): program aborting due to control-C event

Any idea why this is happening with Hedges’ g and not with Cohen’s d effect size?

adam · April 28, 2018, 2:41am

Changing number of jobs in pm.sample() from 2 to 1 worked without producing error (200). But I wish I understood why jobs=2 works when using a Cohen’s d equation but not when using Hedges’ g.

junpenglao · April 28, 2018, 7:01am

Seems this is an error related to joblib that it cannot pickle the operation - sometimes these are memory related. I see it from time to time on my mac.

adam · April 28, 2018, 10:59am

So whenever it might be a memory-related problem, we should try to reduce from jobs=2 to jobs=1?

Topic		Replies	Views
Eight school problem with student t distribution for treatment effects Questions	2	978	October 19, 2022
Testing difference between two Negative Binomial distributions Questions	17	1267	December 8, 2022
Beginner question - Comparing two posterior predictive distributions with different number of observed data v5	8	667	July 12, 2023
Recovering variance from multinomial softmax models version agnostic modeling	10	59	February 6, 2025
Analyzing EEG / MEG data with PyMC3 Questions	4	909	December 7, 2022

Hedges' g effect size with Pymc3

Related topics