Model Comparison: How to constrain a model to positive mu-differences?

Thanks for your help! I believe your answer is the solution to the question I had. Thanks for your recommendations on the modelling as well. As I mentioned I wasn’t sure about the model choices and methods to use. If you don’t mind, I have some follow-up questions.

I’m not sure what you mean by modeling single trials. Having a probability distribution for each single trial? The observed data I produce in the example below (and above) contains up to 90 trials per participant (trial count differs per block and participant). My previous code was transforming the single trials so their mean would give the variance.
Having a random variable for the standard deviation would give me what I need, right? Or did you mean to rather do the modelling for the untransformed projections? In that case I changed the example to the one below with untransformed projections. Though I’m not sure a normal distribution is the right fit for the real data. I think a Shapiro-Wilk test indicated non-normality.

short version:

with pm.Model() as projection_model:
    projection_model.name = "Main Effect Projection"
    sigma_ortho = pm.Gamma("sigma_ortho", mu=3.34, sigma=1.44, shape=n_users)
    sigma_parallel = pm.Gamma("sigma_parallel", mu=7.0, sigma=3.17, shape=n_users)
    # Alternatively: The sum of two gamma-distributed variables is gamma-distributed.
    #sigma_diff = pm.Gamma('sigma_diff', mu=4.0, sigma=5.0, shape=n_users)
    #sigma_parallel = pm.Deterministic("sigma_parallel", sigma_ortho + sigma_diff)
    
    obs_parallel = pm.Normal(f"obs_parallel", mu=0.0, sigma=sigma_parallel, shape=n_users, observed=proj_effect_data['parallel'])
    obs_orthogonal = pm.Normal(f"obs_orthogonal", mu=0.0, sigma=sigma_ortho, shape=n_users, observed=proj_effect_data['orthogonal'])
    sigma_diff = pm.Deterministic('sigma_diff', sigma_parallel - sigma_ortho)

full example:

# %%
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import seaborn as sns


# %%
# Create some data conforming to null model: all deviations on average equal across blocks.
n_trials = 90 - np.random.randint(36)
df0 = pd.DataFrame({'user': 0, 'block': 1 + np.random.randint(3, size=n_trials),
                    'parallel': np.random.normal(0.0, 5.5, size=n_trials),
                    'orthogonal': np.random.normal(0.0, 5.5, size=n_trials)})
# Make sure mean of each block is zero as with real data.
df0[['parallel', 'orthogonal']] = df0.groupby('block')[['parallel', 'orthogonal']].transform(lambda x: x - x.mean())

# %%
# Create some data conforming to main effect of projection model: variance in parallel > orthogonal.
n_trials = 90 - np.random.randint(36)
df1 = pd.DataFrame({'user': 1, 'block': 1 + np.random.randint(3, size=n_trials),
                    'parallel': np.random.normal(0.0, 7.0, size=n_trials),
                    'orthogonal': np.random.normal(0.0, 3.3, size=n_trials)})
# Make sure mean of each block is zero as with real data.
df1[['parallel', 'orthogonal']] = df1.groupby('block')[['parallel', 'orthogonal']].transform(lambda x: x - x.mean())

# %%
df = pd.concat((df0, df1), axis='index')

# %%
df[['user', 'block']] = df[['user', 'block']].astype('category')
df.sort_values(by=['user', 'block'], inplace=True)

# %%
# Vizualize the tranformed data.
samples = df.melt(id_vars='user', value_vars=['parallel', 'orthogonal'], var_name='direction', value_name='projection')
g = sns.FacetGrid(samples, col='user', hue="direction")
g.despine(offset=10)
g = (g.map(sns.distplot, "projection").add_legend(title="Projection", loc='upper right'))
for ax in g.axes.flat:
    ax.set_ylabel("Probability Density")
    ax.set_xlabel("Projection Length")
plt.tight_layout()
plt.show()

# %%
# Massage data into single distributions per user. Each column holds all data for 1 user.
# Since not all users have the same amount of data, we will have missing data at the ends of the columns.
null_data = df.melt(id_vars='user', value_vars=['parallel', 'orthogonal'], var_name='direction', value_name='projection').drop('direction', axis='columns')
null_data['idx'] = null_data.groupby('user').cumcount()
null_data = null_data.pivot(index='idx', columns='user', values='projection')
n_users = df['user'].nunique()

# %%
with pm.Model() as null_model:
    null_model.name = "Null Model"
    sigma = pm.Gamma("sigma", alpha=2.2, beta=0.4, shape=n_users)

    obs = pm.Normal(f"obs", mu=0.0, sigma=sigma, shape=n_users, observed=null_data)

# %%
with null_model:
    idata0 = pm.sample(1000, tune=800, return_inferencedata=True)

# %%
az.summary(idata0, var_names=["Null Model_sigma"])

# %%
# Massage data into single distributions per user and projection direction.
proj_effect_data = df.loc[:, ['user', 'parallel', 'orthogonal']]
proj_effect_data['idx'] = proj_effect_data.groupby('user').cumcount()
proj_effect_data = proj_effect_data.pivot(index='idx', columns='user', values=['parallel', 'orthogonal'])

# %%
with pm.Model() as projection_model:
    projection_model.name = "Main Effect Projection"
    sigma_ortho = pm.Gamma("sigma_ortho", mu=3.34, sigma=1.44, shape=n_users)
    sigma_parallel = pm.Gamma("sigma_parallel", mu=7.0, sigma=3.17, shape=n_users)
    # Alternatively: The sum of two gamma-distributed variables is gamma-distributed.
    #sigma_diff = pm.Gamma('sigma_diff', mu=4.0, sigma=5.0, shape=n_users)
    #sigma_parallel = pm.Deterministic("sigma_parallel", sigma_ortho + sigma_diff)

    
    obs_parallel = pm.Normal(f"obs_parallel", mu=0.0, sigma=sigma_parallel, shape=n_users, observed=proj_effect_data['parallel'])
    obs_orthogonal = pm.Normal(f"obs_orthogonal", mu=0.0, sigma=sigma_ortho, shape=n_users, observed=proj_effect_data['orthogonal'])
    sigma_diff = pm.Deterministic('sigma_diff', sigma_parallel - sigma_ortho)


# %%
with projection_model:
    idata1 = pm.sample(1000, tune=800, return_inferencedata=True)

# %%
az.summary(idata1, var_names=["Main Effect Projection_sigma"], filter_vars='like')

My final goal is to compare the marginal likelihoods of several different theoretically driven models of directed hypotheses (null, main effect parallel > orthogonal, main effect of block 2 on both directions, different interaction effects, …)
I don’t yet understand how the parallel > orthogonal counts would lead me to the marginal likelihoods of the models.
I was previously using your bridge sampling implementation this kind of way (not exactly):

def get_marginal_likelihood(model, trace):
    with model:
        marginal_log_likelihood = bridgesampler.marginal_llk(trace, model=model, maxiter=10000)["logml"]
    return marginal_log_likelihood

# models and traces for each particpant, I hadn't vectorized it yet at the time.
users = [1,2,3,4,5]  # and so on.
model_posterior = dict()
for user in users:
    model_posterior[user] = np.array([get_marginal_likelihood(*model) for model in zip(models[user], traces[user])])  # models[user] would contain all (unvectorized) models for this user.
    # Exponentiate and normalize to turn into a probability distribution.
    model_posterior[user] = np.exp(model_posterior-np.logaddexp.reduce(model_posterior[user]))

So you’re saying using a LOO-CV based method would yield more accurate results?
I tried az.compare({null_model.name: idata0, projection_model.name: idata1}), but it fails with:

TypeError: Found several log likelihood arrays ['Projection Effect_obs_parallel', 'Projection Effect_obs_orthogonal'], var_name cannot be None

I read about the error in this answer, but I don’t understand it enough to translate it to my application.

1 Like