# Lognormal model for reaction times - how to specify?

Hello all!

I have some (toy) reaction time data that I would like to model using Pymc3. The reaction times is assumed to have been measured on two groups: baseline (n=17) and treatment (n=16). There are different individuals in each group (i.e. between-subjects data).

To begin with I have specified a log-normal model for reaction time as a function of condition as follows: RT ~ Lognormal(mu, sigma), mu = Intercept(baseline) + beta(treatment)*X

where X is a dummy-coded variable indicating which participants belonged to the treatment group,

data = data.join(pd.get_dummies(data.Task))
X = np.vstack((data['treatment'])).T
# X = array([,,,,,,,,,,,,,,,,,,,,,,, ,,,,,,,,,])

with pm.Model() as RT_model:
baseline = pm.Normal('baseline', mu=0, sd=3)
treatment = pm.Normal('treatment', mu=0, sd=1, shape=1)
sigma = pm.HalfNormal('sigma', 3)
mu = baseline + pm.math.dot(X, treatment)

RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma, observed=data['RT'])
trace = pm.sample(5000, chains = 2, target_accept = 0.95)


First question: Is it correct to specify my priors assuming log-scale since my model is log-normal and my parameters (intercept, beta, sigma) will be on log-scale?

Second question: Is there a way to perform prior predictive checks and directly obtain predicted reaction times based only on my priors? I’m thinking something like the posterior predictive check where you directly obtain reaction time together with the parameters. I have used the “pm.sample_prior_predictive()” to obtain distributions of my parameters (intercept, beta, sigma) but I don’t quite understand if there is a way to obtain predicted reaction times?

Third question: How do I update my model to allow for different variance for my two conditions? For example, in the below figure, I thought I may want to allow for a lower variance for the baseline condition and a greater variance for the treatment condition.

Fourth question: If I want to include a second predictor variable, what is the best way to achieve that? In my toy-dataset I included a variable that I call task/no_task with the assumption that some of the participants in the treatment and baseline groups performed a task and some did not. The purpose of the model could for example be to understand which of condition and task has the greatest impact on the reaction time. However, I would also still like to be able to make comparisons of reaction times for condition (i.e. treatment vs. baseline). My first attempt of creating such model can be seen below:

The X in this case include the same column for treatment as in the first model, but with two additional columns indicating the participants that performed a task and no task.

data = data.join(pd.get_dummies(data.Condition))

with pm.Model() as RT_model_2:
intercept = pm.Normal('intercept', mu=0, sd=3)
beta = pm.Normal('beta', mu=0, sd=1, shape=3)
sigma = pm.HalfNormal('sigma', 3)
mu = intercept + pm.math.dot(X, beta)

RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma, observed=data['RT'])
trace2 = pm.sample(5000, target_accept = 0.95)


What I think is a little confusing about this model is the interpretation of the intercept in this case? What does it really mean? Is there a better way to specify such model?

Hope my questions are clear. I attach the file with the toy-dataset.
Thanks in advance toy_reaction_time_data.txt (845 Bytes)

Yes - note that the mode of logNormal is e^{\mu-\sigma^2}, so you would need to adjust the prior if you want to match the mode

pm.sample_prior_predictive() is exactly what you are looking for, it generate the prior samples, and the prior predictive samples (i.e., the simulation from prior of your observation). In your case you will get a python dictionary as output of the function, and you can do something like: output_dict['RT_pred']

you can do something like:

with pm.Model() as RT_model:
...
sigma = pm.HalfNormal('sigma', 3, shape=2)
RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma[X], observed=data['RT'])


Adding a new column to your predictor is the way to go. It is a good question about how to interpret the parameters, usually that depends on how you coded your predictors. For example, since you have a factorial design here (a 2x2), you can have 4 binary column to indicate:

treatment_and_task, treatment_and_no_task, baseline_and_task, baseline_and_no_task

and use indexing to each of the parameter
Alternatively, you can code the predictors as an ANOVA model, which the columns are defined as additive:

intercept (baseline_no_task), treatment (treatment == 1), task (task == 1), interaction (treatment and task both == 1)

For more information see: Design matrix - Wikipedia and Coding categorical data — patsy 0.5.1+dev documentation

1 Like

Thanks a lot @junpenglao!

One question to your answer: “pm.sample_prior_predictive()is exactly what you are looking for, it generate the prior samples, and the prior predictive samples (i.e., the simulation from prior of your observation). In your case you will get a python dictionary as output of the function, and you can do something like:output_dict[‘RT_pred’]”.

The thing is I don’t get “RT_pred” as a key in the output_dict. The only keys I have is: “baseline”, “treatment”, “sigma_log__” and “sigma”. Is this a problem of my pymc3 version you think? I currently have 3.9.3.

Using the sample data your provide:

import pandas as pd
import numpy as np
import pymc3 as pm

obs = data['RT'].map(lambda x: np.float32(x.replace(',', '.')))
X = pd.get_dummies(data['Condition'], drop_first=True)

with pm.Model() as RT_model:
baseline = pm.Normal('baseline', mu=0, sd=3)
treatment = pm.Normal('treatment', mu=0, sd=1, shape=1)
sigma = pm.HalfNormal('sigma', 3)
mu = baseline + pm.math.dot(X, treatment)
RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma, observed=obs)
prior_predictive = pm.sample_prior_predictive()

prior_predictive['RT_pred']

2 Likes

Thanks, that solved my problem! 1 Like