Newbie model question

Hi all.
I am at the very early stages of translating a toolbox https://github.com/drbenvincent/delay-discounting-analysis from Matlab + JAGS, to Python + PyMC3. Unfortunately I’m falling at the first hurdle - so would really appreciate any insights about what I’m doing wrong here. I suspect that it’s a simple syntax issue, as I’m a novice with Python.

import pandas as pd
import pymc3 as pm

# minimal example data
df = pd.DataFrame({'A' : [80, 34, 25, 11, 49],
                   'DA' : [0, 0, 0, 0, 0],
                   'B' : [85, 50, 60, 30, 60],
                   'DB' : [157, 30, 14, 7, 89],
                   'R' : [0, 1, 1, 1, 0]})

basic_model = pm.Model()
with basic_model:
    # Priors for unknown model parameters
    logk = pm.Normal('logk', mu=-4, sd=3)
    t = pm.Lognormal('t', mu=0, sd=10)
    # Value functions
    VA = pm.Deterministic('VA', df.A / (1.0+pm.math.exp(logk)*df.DA))
    VB = pm.Deterministic('VB', df.B / (1.0+pm.math.exp(logk)*df.DB))
    # Choice function: softmax
    P_chooseB = pm.Deterministic('P', (VB*t) / pm.math.sum(pm.math.exp([VA*t,VB*t])))    
    # Likelihood (sampling distribution) of observations
    R = pm.Bernoulli('R', p=P_chooseB, observed=df.R)
    
    trace = pm.sample(njobs=2)

I get quite a long error message when trying to construct this model, so I’m not even at the ‘getting inference done’ stage yet. Any help is much appreciated.

There isn’t great support for pandas Series yet – this will work if you manually cast each to a numpy object:

basic_model = pm.Model()
with basic_model:
    # Priors for unknown model parameters
    logk = pm.Normal('logk', mu=-4, sd=3)
    t = pm.Lognormal('t', mu=0, sd=10)
    # Value functions
    VA = pm.Deterministic('VA', df.A.as_matrix() / (1.0+pm.math.exp(logk)*df.DA.as_matrix()))
    VB = pm.Deterministic('VB', df.B.as_matrix() / (1.0+pm.math.exp(logk)*df.DB.as_matrix()))
    # Choice function: softmax
    P_chooseB = pm.Deterministic('P', (VB*t) / pm.math.sum(pm.math.exp([VA*t,VB*t])))
    # Likelihood (sampling distribution) of observations
    R = pm.Bernoulli('R', p=P_chooseB, observed=df.R.as_matrix())
    
    trace = pm.sample(njobs=2)
2 Likes

Also, there is a softmax function in tt.nnet.softmax: http://deeplearning.net/software/theano/library/tensor/nnet/nnet.html. It is numerically more stable.

2 Likes

Thanks for this - much appreciated.

This now works as long as I remove the t parameter… it was causing numerical instability. I’ll have to reintroduce this though to get the model I want though, so I’ll look into the bounds & priors on t.

I looked up the tt.nnet.softmax function mentioned by @aseyboldt, but unfortunately it looks like it doesn’t allow for a temperature parameters which is often used in modelling work (https://en.wikipedia.org/wiki/Softmax_function#Reinforcement_learning)

On twitter, @junpenglao asked for feedback about PyMC3. So my first observation is that (as a newbie) the docs were less clear about how to incorporate deterministic parts of a model into the model. There are some examples, but I think there could be a bit more general help on how one goes about incorporating deterministic aspects into the model. These are often very important parts of cognitive models, for example.

If that’s a bit vague, here’s a concrete example… I need to explore a different choice function, and as a newbie I would not really know how to implement the cumulative normal function (\Phi).
28
I get the hunch I can’t just use arbitrary Python math? It’s not in pymc.math, so I’m not totally sure how to proceed with that. Happy to request it on a GitHub issue if you’d like?

My next steps are to build up this simple proof of concept to the full, hierarchical, version of the model. I may cry for help again!

1 Like

Thanks @drbenvincent! As you observed there are some functions in pymc3.math, and many of them are imported from theano basic operation. For example, the cumulative normal function is avaible in theano/pymc3 as erf. Otherwise, you can write it as a theano operations:

def custom_func(...):
    # define your deterministic function
    return ...
1 Like

Thanks :slight_smile: I made a little helper function to keep the model specification tidy

def cumulative_normal(x):
    return 0.5 + 0.5 * pm.math.erf(x/pm.math.sqrt(2))

I think I’m on track now. My original question is now answered and I’m up and running with inference. (Not sure if you mark this as resolved, like a GitHub issue?)

Although I’m getting some “divergent sample” errors with a model which is equivalent to something that works in JAGS. Presumably you’re going to tell me that the superior sampling algorithms in PyMC3 are highlighting a model misspecification - as opposed to JAGS being superior :thinking: Am hoping I can resolve it by updating priors rather than having uncovering a deeper structural model issue.

There are quite a few post about the divergent warning; I would suggest first start with the doc Diagnosing Biased Inference with Divergences. In short, the NUTS sampler in PyMC3 (and Stan, also many other modern Bayesian Samplers) takes advantage of the geometry of the posterior distribution so it can sample from it much more efficiently. @colcarroll’s talk on this is an excellent introduction as well.

The main approach to get rid of the divergence warning is to reparameterize your model, for example, see a case study by @twiecki. Many of the advice in the Stan manual session 26 also apply to PyMC3 especially when NUTS sampler is applied. There are lots of reparameterization tips inside (also see Stan website on this).

1 Like