Hi! I’m new to PyMC3 and bayesian modelling in general. I’m not even sure my question is appropiate to my problem. I’m not sure about my modelling choices. So please be a bit patient with me, thanks!
There’s a code example following the explanations.
Background:
I’ve got data from an experiment with a repeated measures/within-subject design (the real data is from a mixed design, though the between factor doesn’t concern me right now). Each participant does 3 consecutive blocks of 30 trials. Due to outlier removal and missing data the resulting blocks do not have the same length. The data are lengths of projections of 2D points onto a vector (called parallel in the data) and onto its orthogonal vector. I’m interested in the difference in variance between these projection directions parallel (\parallel) and orthogonal (\perp).
Goal:
I’ve done the classic ANOVA approach (on synergy indices as a measure of the ratio between projections in an uncontrolled manifold analysis, for those who are interested), but I’d like to learn from the data how plausible certain theoretical models about the structure of variance are, given the observed data. My goal is to receive this kind of comparison between models. I used @junpenglao’s bridge sampling implementation, but didn’t parameterize the models very well.
Approach:
The mean of the projections is zero in each block. Therefore, the mean of the squared projections is the variance. So I could compare mean squared projections. Squared projections are all positive values, but there are a few extremes, the variance seems to increase with the mean, the squared data are skewed to the right, and the range of values is large. These were indicators to me that I should log-transform the squared data.
Here I’ll start with 2 simple models: The null model which proposes all data is generated by the same distribution. And a model for a main effect of projection direction (parallel > orthogonal). To come back to my question: How do I make sure that for the main effect model the estimated \mu_\parallel > \mu_\perp ?
The difference in the means of transformed projections between participants is so large that there’s overlap in my priors for \mu_\parallel & \mu_\perp . So it could happen that the model generates 2 almost identical \mu_\parallel & \mu_\perp, when in theory I want the likelihood of the data, given that \mu_\parallel > \mu_\perp, and subsequently P(\mu_\parallel > \mu_\perp|D) vs. P(\mu_\parallel = \mu_\perp|D).
In my mind I would need to standardize the data somehow, so I can make the probabilities for \mu_\parallel and \mu_\perp non-overlapping? Or somehow model the difference between the 2 to be at least > 0? You see, I’m a bit confused about it.
# %%
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
import seaborn as sns
# %%
# Create some data conforming to null model: all deviations on average equal across blocks.
n_trials = 90 - np.random.randint(36)
df0 = pd.DataFrame({'user': 0, 'block': 1 + np.random.randint(3, size=n_trials),
'parallel': np.random.normal(0.0, 4.0, size=n_trials),
'orthogonal': np.random.normal(0.0, 4.0, size=n_trials)})
# Make sure mean of each block is zero as with real data.
df0[['parallel', 'orthogonal']] = df0.groupby('block')[['parallel', 'orthogonal']].transform(lambda x: x - x.mean())
# %%
# Create some data conforming to main effect of projection model: variance in parallel > orthogonal.
n_trials = 90 - np.random.randint(36)
df1 = pd.DataFrame({'user': 1, 'block': 1 + np.random.randint(3, size=n_trials),
'parallel': np.random.normal(0.0, 5.0, size=n_trials),
'orthogonal': np.random.normal(0.0, 1.0, size=n_trials)})
# Make sure mean of each block is zero as with real data.
df1[['parallel', 'orthogonal']] = df1.groupby('block')[['parallel', 'orthogonal']].transform(lambda x: x - x.mean())
# %%
df = pd.concat((df0, df1), axis='index')
# %%
df[['user', 'block']] = df[['user', 'block']].astype('category')
df.sort_values(by=['user', 'block'], inplace=True)
# log-transform squared deviations.
df[['parallel', 'orthogonal']] = df[['parallel', 'orthogonal']].transform('square').transform('log')
# %%
# Vizualize the tranformed data.
samples = df.melt(id_vars='user', value_vars=['parallel', 'orthogonal'], var_name='projection', value_name='ln(proj_sq)')
g = sns.FacetGrid(samples, col='user', hue="projection")
g.despine(offset=10)
g = (g.map(sns.distplot, "ln(proj_sq)").add_legend(title="Projection", loc='upper right'))
for ax in g.axes.flat:
ax.set_ylabel("Probability Density")
ax.set_xlabel("Ln(Squared Projection length)")
plt.tight_layout()
plt.show()
# %%
# Massage data into single distributions per user. Each column holds all data for 1 user.
# Since not all users have the same amount of data, we will have missing data at the ends of the columns.
null_data = df.melt(id_vars='user', value_vars=['parallel', 'orthogonal'], var_name='projection', value_name='ln(proj_sq)').drop('projection', axis='columns')
null_data['idx'] = null_data.groupby('user').cumcount()
null_data = null_data.pivot(index='idx', columns='user', values='ln(proj_sq)')
n_users = df['user'].nunique()
# %%
with pm.Model() as null_model:
null_model.name = "Null Model"
mu = pm.Normal("mu", mu=0.8, sigma=1.25, shape=n_users)
sigma = pm.Gamma("sigma", alpha=25.5, beta=9, shape=n_users)
obs = pm.Normal(f"obs", mu=mu, sigma=sigma, shape=df['user'].nunique(), observed=null_data)
# %%
with null_model:
idata0 = pm.sample(1000, tune=800, return_inferencedata=True)
# %%
az.summary(idata0, var_names=["Null Model_mu", "Null Model_sigma"])
# %%
# Massage data into single distributions per user and projection direction.
proj_effect_data = df.loc[:, ['user', 'parallel', 'orthogonal']]
proj_effect_data['idx'] = proj_effect_data.groupby('user').cumcount()
proj_effect_data = proj_effect_data.pivot(index='idx', columns='user', values=['parallel', 'orthogonal'])
# %%
with pm.Model() as projection_model:
projection_model.name = "Main Effect Projection"
mu_ortho = pm.Normal("mu_ortho", mu=0.09, sigma=2.55, shape=n_users)
mu_parallel = pm.Normal("mu_parallel", mu=1.5, sigma=2.5, shape=n_users)
sigma = pm.Gamma("sigma", mu=2.65, sigma=0.25, shape=n_users)
obs_parallel = pm.Normal(f"obs_parallel", mu=mu_parallel, sigma=sigma, shape=n_users,
observed=proj_effect_data['parallel'])
obs_orthogonal = pm.Normal(f"obs_orthogonal", mu=mu_ortho, sigma=sigma, shape=n_users,
observed=proj_effect_data['orthogonal'])
diff = pm.Deterministic('mu_diff', mu_parallel - mu_ortho)
# %%
with projection_model:
idata1 = pm.sample(1000, tune=800, return_inferencedata=True)
# %%
az.summary(idata1, var_names=["Main Effect Projection_mu", "Main Effect Projection_sigma"], filter_vars='like')
There are more models I want to evaluate for main effects of block and interactions (e.g. only parallel projection variance is smaller in block 2), so I need to understand the basics.
Previous approach:
Before I decided to log-transform the data, \mu_\perp and a \Delta\mu were modelled using gamma distributions and obs_parallel had \mu_\perp + \Delta\mu as its mu-parameter to construct \mu_\parallel.
I also created the models for each participant to feed the result to the bridge sampling. Then I found out I could vectorize the model. So now I would need to figure out, if and how the bridge sampling works with the vectorized form. The whole project can be found here.