# Replacing nested for loops from JAGS code for cognitive model of temporal discounting

Hi all,

I’m trying to implement some of the models from Farell and Lewandowsky (2018). I’m up to the last Bayesian hierarchical model example in Chapter 9, which describes a model of temporal discounting given the value and delay of options A and B. However I’m having some difficulties translating the nested for-loops in the JAGS code into PyMC3 code.

## The Model

V^B = B \times \frac{1}{1+kD}

Where V^B is the ‘present subjective value’ for the delayed reward B (e.g. an amount of money), and D is the delay of this reward (e.g. a time in months). The parameter k determines the steepness of the function.

Once the competing amounts are discounted according to this equation, the participant chooses between V^A and V^B according to:

P(\text{choose }V^B) = \Phi \Bigg( \frac{V^B - V^A}{\alpha} \Bigg)

Where \alpha determines the ‘sharpness’ of the decision, and \Phi is the normal CDF. This equation determines the probability of the observed choice (either option A or B, given the discounting).

We want to estimate \alpha and k parameters that are subject-specific, but are assigned (bounded) normal distribution hyperpriors. The V^A and V^B clearly depend upon both the size and delay of the reward.

## The JAGS code

The JAGS code makes this model work by a nested for-loop, where the participant-level parameters are estimated, and then the probability of the choice for each trial (total number of trials denoted as T) has an estimated V^A and V^B.

model{
# Group-level hyperpriors
# k (steepness of hyperbolic discounting)
groupkmu        ~ dnorm(0, 1/100)
groupksigma     ~ dunif(0, 100)

# comparison acuity (alpha)
groupALPHAmu        ~ dunif(0,5)
groupALPHAsigma     ~ dunif(0,5)

# Participant-level parameters
for (p in 1:nsubj){
k[p]        ~ dnorm(groupkmu, 1/(groupksigma^2)) T(0,)
alpha[p]    ~ dnorm(groupALPHAmu, 1/(groupALPHAsigma^2)) T(0,)

for (t in 1:T) {
# calculate present subjective value for each reward
VA[p,t] <- A[p,t] / (1+k[p]*DA[p,t])
VB[p,t] <- B[p,t] / (1+k[p]*DB[p,t])

# Psychometric function yields predicted choice
P[p,t] <-  phi( (VB[p,t]-VA[p,t]) / alpha[p] )

# Observed responses
R[p,t] ~ dbern(P[p,t])
}
}
}


# My PyMC3 Code

To replace the for-loops, I’ve tried separating the data into n_{\text{participants}} by n_{\text{trials}} matrices, and using broadcasting on the estimated subject-specific a and k parameters to write the equations in the model as-is. The issues is that I’m getting ‘bad initial energy’ warnings and the model won’t run.

import numpy as np
import pymc3 as pm

# Import data
dat = pd.read_csv("https://raw.githubusercontent.com/psy-farrell/computational-modelling/master/codeFromBook/Chapter9/hierarchicalITC.dat", sep = "\t")

# Experiment info
n_subj = len(dat['subj'].unique())
n_trials = int(dat.shape[0]/n_subj)

# Convenience lambda to reshape the data
reshape = lambda var: dat[var].values.reshape((n_subj, n_trials))

# Extract variables in n * t matrices
A, DA = reshape('A'), reshape('DA')
B, DB = reshape('B'), reshape('DB')
R = reshape('R')

def Phi(x):
#'Cumulative distribution function for the standard normal distribution'
# A.k.a the probit transform
return(0.5 + 0.5 * pm.math.erf(x/pm.math.sqrt(2)))

with pm.Model() as Discounting:
# Priors for Hyperbolic Discounting Parameter K
k_mu = pm.Normal('K_mu', mu = 0, sigma = 10)
k_sigma = pm.Uniform('K_sigma', lower = 0, upper = 10)

# Priors for Comparison Acuity (alpha)
alpha_mu = pm.Uniform('Alpha_mu', lower = 0, upper = 5)
alpha_sigma = pm.Uniform('Alpha_sigma', lower = 0, upper = 5)

# Participant Parameters
BoundedNormal = pm.Bound(pm.Normal, lower = 0.0)
k = BoundedNormal('k', mu = k_mu, sigma = k_sigma, shape = n_subj)
alpha = BoundedNormal('alpha', mu = alpha_mu, sigma = alpha_sigma, shape = n_subj)

# Subjective Values
V_A = pm.Deterministic("V_A", A / (1 + k[:, None] * DA))
V_B = pm.Deterministic("V_B", B / (1 + k[:, None] * DB))

# Psychometric function
P = pm.Deterministic('P', Phi((V_B - V_A)/alpha[:, None]))

# Observed
h = pm.Bernoulli('h', p = P, observed = R)

# Sample
Discounting_trace = pm.sample(5000, init = 'adapt_diag', chains = 4)


Any thoughts on how I can properly estimate this?

Thanks!

I think your starting values lead to probabilities P of zero, which lead to -inf Bernoulli probability conditioned on non-zero data. You will have to tweak your distribution testval (starting values) so that P does not start as 0 or 1 for any observed value that is not 0 or 1.

I commented out the likelihood and did this to confirm my suspicion:


...
# Observed
#h = pm.Bernoulli('h', p = P, observed = R)

with Discounting:
trace = pm.sample(tune=0, draws=1, chains=1)

np.sum((trace['P'] == 0.0) & (R != 0))
120


There are 120 observed values that are impossible given the start P vals in that run.

It seems to be caused by underflow in the Phi function which leads to zero when (V_B - V_A) / alpha are very negative. In fact anything smaller than -9 underflows to zero:

Phi(np.array([-13.0, -11.0, -9.0, -7.0])).eval()
array([0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.27980959e-12])


One solution is to work in the logit scale:

...
# Import the helper normal log cdf so that we can get the log of Phi
from pymc3.distributions.dist_math import normal_lcdf

#P = pm.Deterministic('P', Phi((V_B - V_A)/alpha[:, None]))

# logit(p) = log(p) - log(1-p)
log_P = normal_lcdf(0, 1, (V_B - V_A) / alpha[:, None])  # log(p)
log_1m_P = pm.math.log1mexp(-log_P)  # log(1-p)
logit_P = pm.Deterministic('logit_P', log_P - log_1m_P)
# Observed
h = pm.Bernoulli('h', logit_p=logit_P, observed = R)

Discounting.check_test_point()

K_mu                         -3.22
K_sigma_interval__           -1.39
Alpha_mu_interval__          -1.39
Alpha_sigma_interval__       -1.39
k_lowerbound__              -38.23
alpha_lowerbound__          -30.23
h                        -68137.95  # This is no longer -inf!
Name: Log-probability of test_point, dtype: float64


If this works, please make sure the values make sense and that I did not introduce any mistake with my changes. In particular if you wrap the logit_P in new_P = pm.Deterministic('new_P', pm.math.invlogit(logit_p)) you should see that new_P and your original P (if you uncomment it) give the same results, except for the cases where your original P underflowed to zero.

3 Likes

Fantastic solution, thank you! I hadn’t thought to check for underflow. The model now runs and generates roughly similar parameter estimates to those shown in the textbook.

I might need to re-parameterise further, as there are quite a number of divergences after tuning, even after pushing up both target_accept and tune values on pm.sample(). Using new_P also results in heaps of posterior samples of almost exactly 0 or 1 with fairly minuscule sd values. But maybe that’s to be expected. I’ll try experimenting with the model a bit to find solutions for these issues myself.

Thanks very much for the help!

That chapter gives an overflow of my discounting model I published in 2015 Although I only just saw this post before it was answered.

Vincent, B. T. (2016). Hierarchical Bayesian estimation and hypothesis testing for delay discounting tasks. Behavior research methods, 48(4), 1608-1620.

1 Like

If you are dealing with data where the delayed rewards are always immediate, then all the DA values will be zero. I had previously gotten around this by simply setting V_A = A, that is the subjective value of choice A is equal to its objective value. This is only valid to do if choice A is delivered immediately and is equivalent to saying it is not discounted at all.

So this might be useful @awhug
A while ago, I switched to using the logistic function rather than the cumulative normal. It seems to work better numerically.

def logistic(x, α):
return 1.0 / (1.0 + pm.math.exp(-α * x))

logistic(np.array([-100.0, -50.0, -20.0, -10.0, -1.0]), 1).eval()


results in

array([3.72007598e-44, 1.92874985e-22, 2.06115362e-09, 4.53978687e-05,
2.68941421e-01])


It also avoids the divergences in sampling.

Excellent feedback. Very cool to have the model’s author weigh in - should’ve cited you in the original!

You’re right - in this data DA is indeed zero on every observation (i.e. immediate delivery).

I previously did try dropping V_A previously and directly calculating log_P = normal_lcdf(0, 1, (V_B - A) / alpha[:, None]) after I got the model working, although this didn’t help much with the divergences.

Re the logistic function - I’m assuming α still defines the steepness of the curve (taking the place of \alpha in the cumulative normal), so now my code looks like this

def logistic(α, x):
return 1.0 / (1.0 + pm.math.exp(-α* x))
...
# Psychometric function
P = logistic(alpha[:, None], (V_B - V_A))


In which case, this reparameterisation hasn’t worked for me unfortunately. Still hundreds of divergences. Perhaps I’ve misunderstood though.

Tightening the prior on alpha_sigma to uniform(0, 5) has helped a little. This parameter generally seems to have a lot of uncertainty and a low ESS. Tricky to diagnose the exact source of the problem though.

1 Like

Forgot to mention I set target_accept=0.9 when sampling, that helps.

Yes, α corresponds to the slope of the logistic, it just doesn’t correspond to the sd of a normal any more.

Also, I use a slightly different formulation from what’s in the book chapter. With the version in the book chapter, if you get a choice for A in a zone where the model predicts a near 100% chance of choosing B, then the only way to accommodate that is by decreasing the slope of the function.

So what I do is add in an additional parameter which governs a ‘lapse rate’ so that the function asymptotes at ϵ rather than 0, or 1-ϵ rather than 1. So I actually use something like this:

def Φ(x, α, ϵ):
"""Psychometric function which converts the decision variable (VB-VA)
into a reponse probability. Output corresponds to probability of choosing
the delayed reward (option B)."""
return ϵ + (1.0 - 2.0 * ϵ) * logistic(x, α)

def logistic(x, α):
"""Logistic function. α is the steepness of the function"""
return 1.0 / (1.0 + pm.math.exp(-α * x))


with this kind of prior

ϵ = pm.Beta("ϵ", 1 + 1, 1 + 50, shape=participants)


I’ve not specifically adapted that to your code/data, but it may help. Let me know.

When applying the model to real human data (which can be messy) getting decent fits can still be a bit of an art.

2 Likes

By the way there is an implementation of the logistic distribution (and its cdf which is the logistic function) in Pymc3: Continuous — PyMC3 3.10.0 documentation

Nice. Can’t quite figure out how to call it

def Φ(x, α, ϵ):
return ϵ + (1.0 - 2.0 * ϵ) * pm.Logistic("_", 0, α).cdf(x)


doesn’t want to work

You have to use the .dist method.

def Φ(x, α, ϵ):
return ϵ + (1.0 - 2.0 * ϵ) * pm.Logistic.dist(0, α).cdf(x)


This gives you the cdf (or logp if you wanted that) function without trying to create a model random variable.

1 Like

Gives…

AttributeError: 'Logistic' object has no attribute 'cdf'

Which version of pymc3 are you using?

3.11.1

Nevermind, it should be .logcdf

I guess that might be less useful (you can always exponentiate, but by then it’s probably more complicated than the expression you had originally)

The last helper method I might suggest then is pymc3.math.invlogit: Math — PyMC3 3.10.0 documentation

1 Like

That does the job