Hello all!
I have some (toy) reaction time data that I would like to model using Pymc3. The reaction times is assumed to have been measured on two groups: baseline (n=17) and treatment (n=16). There are different individuals in each group (i.e. between-subjects data).
To begin with I have specified a log-normal model for reaction time as a function of condition as follows: RT ~ Lognormal(mu, sigma), mu = Intercept(baseline) + beta(treatment)*X
where X is a dummy-coded variable indicating which participants belonged to the treatment group,
data = data.join(pd.get_dummies(data.Task))
X = np.vstack((data['treatment'])).T
# X = array([[0],[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],[1],[1],[0],[0],[1],[0],[1],[0],[1],[1], [1],[1],[0],[0],[1],[0],[0],[1],[0],[0]])
with pm.Model() as RT_model:
baseline = pm.Normal('baseline', mu=0, sd=3)
treatment = pm.Normal('treatment', mu=0, sd=1, shape=1)
sigma = pm.HalfNormal('sigma', 3)
mu = baseline + pm.math.dot(X, treatment)
RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma, observed=data['RT'])
trace = pm.sample(5000, chains = 2, target_accept = 0.95)
First question: Is it correct to specify my priors assuming log-scale since my model is log-normal and my parameters (intercept, beta, sigma) will be on log-scale?
Second question: Is there a way to perform prior predictive checks and directly obtain predicted reaction times based only on my priors? I’m thinking something like the posterior predictive check where you directly obtain reaction time together with the parameters. I have used the “pm.sample_prior_predictive()” to obtain distributions of my parameters (intercept, beta, sigma) but I don’t quite understand if there is a way to obtain predicted reaction times?
Third question: How do I update my model to allow for different variance for my two conditions? For example, in the below figure, I thought I may want to allow for a lower variance for the baseline condition and a greater variance for the treatment condition.
Fourth question: If I want to include a second predictor variable, what is the best way to achieve that? In my toy-dataset I included a variable that I call task/no_task with the assumption that some of the participants in the treatment and baseline groups performed a task and some did not. The purpose of the model could for example be to understand which of condition and task has the greatest impact on the reaction time. However, I would also still like to be able to make comparisons of reaction times for condition (i.e. treatment vs. baseline). My first attempt of creating such model can be seen below:
The X in this case include the same column for treatment as in the first model, but with two additional columns indicating the participants that performed a task and no task.
data = data.join(pd.get_dummies(data.Condition))
data = data.join(pd.get_dummies(data.Task))
X = np.vstack((data['treatment'], data['task'], data['no_task'])).T
with pm.Model() as RT_model_2:
intercept = pm.Normal('intercept', mu=0, sd=3)
beta = pm.Normal('beta', mu=0, sd=1, shape=3)
sigma = pm.HalfNormal('sigma', 3)
mu = intercept + pm.math.dot(X, beta)
RT_pred = pm.Lognormal('RT_pred', mu=mu, sd=sigma, observed=data['RT'])
trace2 = pm.sample(5000, target_accept = 0.95)
What I think is a little confusing about this model is the interpretation of the intercept in this case? What does it really mean? Is there a better way to specify such model?
Hope my questions are clear. I attach the file with the toy-dataset.
Thanks in advance toy_reaction_time_data.txt (845 Bytes)