Hi,
I am looking to extend Kruschke’s BEST model for multi-group comparisons (3 in my particular case).
In his paper, Kruschke says:
For multiple groups, on the other hand, the model of Figure 2 can be extended in two ways. First, of course, every group is provided with its own µj and σj parameter, but with ν shared by all groups. Second, and importantly, the model can be provided with a higher-level distribution across the group means, if desired. This higher-level distribution describes the distribution of the µj across groups, wherein the overall mean of the groups is estimated, and the between group variability is estimated.
I started an implementation with pymc:
from itertools import combinations
from pathlib import Path
import numpy as np
import pandas as pd
import pymc3 as pm
def best(data, value, group):
"""
This model replicates the example used in:
Kruschke, John. (2012) Bayesian estimation supersedes the t-test. Journal of Experimental Psychology: General.
The original model is extended to handle multiple groups.
Parameters
----------
data: pandas.DataFrame
Tidy pandas dataframe
value: str
Name of the column holding the values
group: str
Name of the column holding the groups
Returns
-------
pymc3.Model
"""
groups = data[group].unique()
# pooled empirical mean of the data and twice the pooled empirical standard deviation
mu = data[value].mean()
sd = data[value].std() * 2
# group standard deviations priors
σ = [0.05, 0.5]
with pm.Model() as model:
groups_means = {
igroup: pm.Normal(f"{igroup}_mean", mu=mu, sd=sd) for igroup in groups
}
groups_std = {
igroup: pm.Uniform(f"{igroup}_std", lower=σ[0], upper=σ[-1])
for igroup in groups
}
# prior for ν exponentially distributed with a mean of 30
ν = pm.Exponential("ν_minus_one", lam=1 / 29.0) + 1
# precision (transformed from standard deviations)
λ = {igroup: groups_std[igroup] ** -2 for igroup in groups}
likelihoods = {
igroup: pm.StudentT(
igroup,
nu=ν,
mu=groups_means[igroup],
lam=λ[igroup],
observed=data.query(f'{group} == "{igroup}"')[value].dropna(),
)
for igroup in groups
}
delta_means, delta_std, effect_size = {}, {}, {}
for a, b in combinations(groups, 2):
a_minus_b = f"{a} - {b}"
delta_means[a_minus_b] = pm.Deterministic(
f"Δμ ({a_minus_b})", groups_means[a] - groups_means[b]
)
delta_std[a_minus_b] = pm.Deterministic(
f"Δσ ({a_minus_b})", groups_std[a] - groups_std[b]
)
effect_size[a_minus_b] = pm.Deterministic(
f"effect size ({a_minus_b})",
delta_means[a_minus_b]
/ np.sqrt((groups_std[a] ** 2 + groups_std[b] ** 2) / 2),
)
return model
- Is this a correct implementation to compare three groups with each other? I’m not sure I correctly applied Kruschke’s second point. I imagine he implies some kind of hierarchical model?
the model can be provided with a higher-level distribution across the group means
- Do you know a way to use a similar model to do a multivariate analysis (e.g. three groups but with three different variables)