How to specify this model directly in Pymc3?

Hello,

I would like to model some repeated-measures data with Pymc3. The data comes from an experiment where n=17 subjects experienced three conditions [baseline, treatment1, treatment2] and a reaction time was measured for each condition. The final data set has 51 reaction times (3 conditions * 17 subjects).

  1. I have managed to fit the following varying-intercepts model with the bambi-package:

b_model = bambi.Model(data)
results = b_model.fit(‘reaction_time ~ condition’, random = [‘1|subject_id’], tune = 2000,
samples = 3000, target_accept = 0.9)

  1. I would like to create the same model (or a replica of the bambi-model) directly with Pymc3, but cannot get it to work. As I’ve understood it a hierarchical model with varying intercept can be specified in Pymc3 using hyperpriors and priors (and in my case only for the intercept). The model I’ve specified so far is:

with pm.Model() as bambi_replica:
# Hyperpriors
mu_intercept = pm.Normal(‘mu_intercept’, mu=0, sd=1)
sd_intercept = pm.HalfNormal(‘sd_intercept’, sd=1)
# Priors
Intercept = pm.Normal(‘Intercept’, mu = mu_intercept, sd=sd_intercept, shape=n_subjects)
reaction_time_sd = pm.HalfNormal(‘reaction_time_sd’, sd = data.reaction_time.std()*2)
condition = pm.Normal(‘condition’, mu = 0, sd = 1, shape=2)
# mu
mu = Intercept[tp_idx] + condition[0]*x_1 + condition[1]*x_2
# Likelihood (sampling distribution) of observations
reaction_time = pm.Normal(‘Y_obs’, mu=mu, sd = reaction_time_sd, observed=reaction_time_data)

Here:

  • n_subjects = 17 (since I have 17 subjects)
  • x_1 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0], dtype=int64) (to identify indexes for treatment 1)
  • x_2 = array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
    1, 1, 1, 1, 1, 1, 1], dtype=int64) (to identify indexes for treatment 2)
  • reaction_time_data = array([2.918456, 3.124924, 3.013115, 3.117612, 2.91747 , 3.157323,
    3.101546, 3.595617, 2.961656, 2.960227, 2.537308, 2.908257,
    2.971608, 3.074054, 2.870034, 2.867201, 3.149056, 3.186297,
    3.065235, 4.028026, 3.068959, 2.902312, 3.276905, 3.398952,
    3.286361, 3.105795, 3.748059, 2.893177, 4.078607, 3.412715,
    3.849621, 3.440311, 3.416428, 3.169723, 3.256486, 3.014277,
    2.745785, 4.160992, 3.226214, 4.321797, 3.05727 , 3.231891,
    3.611861, 3.229321, 2.851948, 2.622444, 2.844787, 3.229938,
    3.055713, 2.722098, 3.158651])
  • tp_idx = array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
    0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16],
    dtype=int64)

Questions:

  1. Would this approach correspond to the varying-intercept model specified with bambi (i.e., ‘reaction_time ~ condition’, random = [‘1|subject_id’])? I understand that the priors may not be specified exactly the same, but I’m mainly interested in if the overall structure of the model corresponds to a varying-intercepts model that could be used to model the data I have.
  2. With bambi it seems to work fine to fit the model (i.e., the model seems to converge, traceplots look good). The model I specified directly with Pymc3, however, does not seem to converge and I get error-messages looking like below. I can’t understand why this is. Any clues?

There were 104 divergences after tuning. Increase target_accept or reparameterize.
The acceptance probability does not match the target. It is 0.7975575845495201, but should be close to 0.95. Try to increase the number of tuning steps.

The estimated number of effective samples is smaller than 200 for some parameters.

All suggestions are appreciated!

Hi Linda,
At first glance I’d say your model seems right for a varying-intercepts model, and I’d expect it to run :thinking:

  • Have you tried standardizing your reaction_time_data? I’m not sure it will have an impact, as your data don’t seem to be on a huge scale, but you never know…

  • Which parameters have a small effective sample size? And which don’t look good in general? This could come from sd_intercept being very close to zero (i.e subjects being very similar to each other), in which case a non-centered parametrization could do the trick.

In any case, if you haven’t seen it already, here is the port to PyMC of end-of-chapter exercices for the chapter on varying-intercepts of McElreath’s book – it’s really useful!
Hope this helps :vulcan_salute:

Thanks @AlexAndorra :slight_smile:

  • I standardized the reaction time

standardized_rt = (data.reaction_time - reaction_time_data.mean())/reaction_time_data.std()).

That only did not seem to make a difference.

  • I think I tried a non-centered parametrization:

mu_intercept = pm.Normal(‘mu_intercept’, mu=reaction_time_data.mean(), sd = reaction_time_data.std()*2)).

That did also not make a difference.

  • I’ve noticed that the bambi-model uses a uniform prior on the reaction_time_sd. A combination of: 1) standardizing the reaction time and changing the prior from half-normal to a uniform for the reaction_time_sd with limits: lower = 0, upper = 0.4 gives me a model that converges. However, the traceplot for the standard-deviation looks cut (result of the cut in the uniform prior I assume):

It seems that the model-convergence is very dependent on this prior choice. Any clues why?

  • Thanks for the suggestion on further readings about varying-intercepts. I had not seen that before. Will check it out! Your suggestions have been very helpful indeed!

You only have three data points per a subject so the choice of priors is important. I got a model to run using a non-centered parameterisation and increasing the target_accept to 0.95. By non-centered Alex means that you sample the subjects as N(0,1) and then add the population mean and scale by standard deviation afterwards.

with pm.Model() as bambi_replica:
    # Hyperpriors
    mu_intercept = pm.Normal('mu_intercept', mu=0, sd=1)
    sd_intercept = pm.Exponential('sd_intercept', 1.)
    
    # Priors
    #Intercept = pm.Normal('Intercept', mu=mu_intercept, sd=sd_intercept, shape=n_subjects)
    Intercept = pm.Normal('Intercept', mu=0, sd=1, shape=n_subjects)
    reaction_time_sd = pm.Exponential('reaction_time_sd', reaction_time_data.std()*2)
    condition = pm.Normal('condition', mu=0, sd=1, shape=2)
    
    # mu
    #mu = Intercept[tp_idx] + condition[0]*x_1 + condition[1]*x_2
    mu = (mu_intercept + sd_intercept*Intercept[tp_idx]) + condition[0]*x_1 + condition[1]*x_2
    
    # Likelihood (sampling distribution) of observations
    reaction_time = pm.Normal('Y_obs', mu=mu, sd=reaction_time_sd, observed=reaction_time_data)
    
    trace = pm.sample(nuts_kwargs={'target_accept':0.95})
pm.summary(trace)

In general flat or uniform priors on scale (standard deviation) parameters is not a good idea, you want to have more regularisation towards zero. Your trace of reaction_time_sd is clearly pushing against the upper limit you’ve imposed of 0.4

2 Likes

Thanks Nicholas – couldn’t have said it better!
PS: note that in the newest PyMC version, you can just add this arg to pm.sample: target_accept=0.95 – more readable than the nuts_kwargs dict :wink:

@tankenot, as you’ll see in Nicholas’s trace, a lot of the mass of sd_intercept is very near zero, as I suspected above. This is typically the case when a non-centered parametrization will work a lot better.
Hope this helps :vulcan_salute:

1 Like

Thanks a lot to both of you! @AlexAndorra, @nkaimcaudle
It’s been very helpful! :star_struck:

1 Like