Behrens' Bayesian Learner Model : How to share parameters between steps?

After the limitation of the new user, I am back :wink:

Here is the plot.

As a comparison, ‪Michael Waskom (The author of seaborn) implement the model by using numpy. Here is the result of him. His code can find here

Can you edit the previous post to include the model code (and perhaps the data if that’s feasible)?

Edit: the plots for I and k (and the pairwise) might also be useful. Do you get any warnings / divergences during sampling?

I edited the first post and comment on the data generation code. I add Waskom’s code as well.

I got some warning like this last time. When I tried to use the logistic random walk code.

The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

I read a PyMC3 tutorial Updating priors. I believe it gives me the solution which can save more information between time steps.

I updated my notebook with a GaussianRW and it seems to do as well as the model with the betas (and sample much faster): Google Colaboratory

Neither model includes the volatility term, but so far I see no evidence that the integrated model won’t work (i.e., no need for fitting each step at a time, which would always discard information).

You inspired my code like this:

# Create the test data
p = np.repeat([.75, .75, .75, .25, .75, .25], 5)
env= stats.binom.rvs(1, p)
choose = list(map(bool, env))

observed_data = pd.DataFrame({"choose": choose, 
                              "index" : range(len(env))})

with pm.Model() as bayesian_lerner_model:
    k = pm.GaussianRandomWalk("k", mu = 5, sigma = 100, shape = len(env))
    k_ = pm.Deterministic('k_cap', pm.math.exp(k))
    v = pm.GaussianRandomWalk("v", mu = 0.7, sigma = k_, testval = 0.05, shape = len(env))
    v_ = pm.Deterministic('v_cap', pm.math.exp(v))

    r = []
    for ti in range(len(env)):
        if ti == 0:
            # Testvals are used to prevent -inf initial probability
            r.append(pm.Beta(f'r{ti}', mu=0.5, sigma=1))
        else: 
            r.append(pm.Beta(f'r{ti}', mu=r[ti-1], sigma=v_[ti - 1]))

    r = pm.Deterministic('r', pm.math.stack(r))
    y = pm.Bernoulli("y", p = r, observed = env)
        
    map_estimate = pm.find_MAP()

Unfortunately, I still meet the error about Initial evaluation of model.

Here is my notebook.

That prior is way too broad. Specially since you are taking the exponent for v. One standard devation is already enormous: np.exp(105) is 3.989519570547216e+45.

The mu / sigma parameterization is very sensitive, as sigma is only valid if sigma < np.sqrt(mu * (1 - mu)). You have to at least set a valid testval for each Beta, but this hard constraint will make sampling very inefficient (no reason to use find_MAP, if you can use MCMC sampling). The lack of testval is probably what is giving you the invalid initial evaluation.

You can parameterize your Betas with alpha = w*(k-2) + 1 and beta = (1-w)*(k-2) + 1 where w = r[ti-1] and k = 1 / v_[ti-1]: Beta distribution - Wikipedia

1 Like

I write code like this:

p = np.repeat([.75, .75, .75, .25, .75, .25], 18)
env= stats.binom.rvs(1, p)
choose = list(map(bool, env))

observed_data = pd.DataFrame({"choose": choose, 
                              "index" : range(len(env))})

with pm.Model() as bayesian_lerner_model:
    k = pm.Normal("k", mu = 1, sigma = 1, testval = 0.6)
    k_ = pm.Deterministic('k_cap', pm.math.exp(k))
    v = pm.GaussianRandomWalk("v", mu = 0.7, sigma = k_, testval = 0.05, shape = len(env))
    v_ = pm.Deterministic('v_cap', pm.math.exp(v))

    r = []
    for ti in range(len(env)):
        if ti == 0:
            # Testvals are used to prevent -inf initial probability
            r.append(pm.Beta(f'r{ti}', 1, 1))
        else: 
            w = r[ti-1]
            k = 1 / v_[ti-1]
            r.append(pm.Beta(f'r{ti}', alpha=w*(k-2) + 1, beta=(1-w)*(k-2) + 1, testval = 0.5))

    r = pm.Deterministic('r', pm.math.stack(r))
    y = pm.Bernoulli("y", p = r, observed = env)
    
    trace = pm.sample(return_inferencedata=True)

It seems right (maybe?) and can work with fewer steps. But when I add more steps, I still meet -inf logp.

That’s surprising. Do you see any r equal 0 or 1? You can remove the observations and just sample from the prior to see what they look like.

Can you print bayesian_lerner_model.check_test_point, to see where the -inf are coming from?

It looks like the -inf appears at the sampling stage. When I try to print the bayesian_lerner_model.check_test_point, all of the r_logodds__ share a same -1.87 logp.

But when I start the sampling, it will show a SamplingError like this:

Initial evaluation results:
k                  -0.92
v                -212.25
r0_logodds__       -1.44
r1_logodds__       -1.61
r2_logodds__       -1.60
                   ...  
r104_logodds__     -1.49
r105_logodds__     -1.67
r106_logodds__      -inf
r107_logodds__     -1.35
y                 -77.09
Name: Log-probability of test_point, Length: 111, dtype: float64

I don’t know why there are differents between the check_test_point and the sampling starting point.

The sampler by default adds an initial jittering around the initval. You can try to set init="adapt_diag" in pm.sample. But the bigger problem is that the model is still numerically unstable (even though the current parameterization is unbounded). Can you try to reduce the mean of v? That may be getting very large for the later trials if the sequence is large enough.

I added init="adapt_diag" and the sampler running smoothly. Thanks!

And it looks like can catch the volatile of r

The volatility of v looks better too.

I am not sure I finished the correct model. Because the more smooth curve generated by Waskom’s Code. ( \hat{i} is the inverse of v_cap in my model).

1 Like

Looks like it could be just a difference in the volatility priors (I suspect coming mostly from the mu of the GaussianRW).

Your v_cap seems to be around 1/0.1=10, which is twice as large as the one in their model. On the other hand, I am not sure if the concentration should map to 1/sigma or 1/ sigma**2. If it’s the later your average would be roughly equivalent to sqrt(1/0.1)=3.16 which looks to be be on the same ballpark as the plot.

Edit: Looking at the raw data, I think your results look more “sensible”

I think so. My model looks more “sensible” than Waskom’s model.

And I think my model still catches the volatile of the sequence and the probability of p. So I think I do reproduce the model of Behrens.

I may write a tutorial next week (my supervisor gives me a huge of work on my fmri data, and the DDL is next Monday) and let more people check our model and through.

I meet a new problem, the r.append(pm.Beta('r')) will generate so many RVs and assert an error : “fatal error: bracket nesting level exceeded maximum of 256” :cry:

Here is the full report:

Exception: ('Compilation failed (return status=1): ~/.theano/compiledir_macOS-10.16-x86_64-i386-64bit-i386-3.8.10-64/tmp_kphrv89/mod.cpp:30692:32: fatal error: bracket nesting level exceeded maximum of 256.         if (!PyErr_Occurred()) {.                                ^. ~/.theano/compiledir_macOS-10.16-x86_64-i386-64bit-i386-3.8.10-64/tmp_kphrv89/mod.cpp:30692:32: note: use -fbracket-depth=N to increase maximum nesting level. 1 error generated.. ', "FunctionGraph(MakeVector{dtype='float64'}(r0, r1, r2, ...... , r678, r679))")

I also set theano.config.gcc.cxxflags = "-fbracket-depth=2048" and sys.setrecursionlimit(10**6). Then Python3 raise me an error.

maximum recursion depth exceeded while calling a Python object

So I think the for loop way may not a good way to archive my model.

I tried to use theano.scan() instead of for loop to solve this problem. Here is my code:

p = np.repeat([.75, .75, .75, .25, .75, .25], 18)
env = stats.binom.rvs(1, p)
choose = list(map(bool, env))

observed_data = pd.DataFrame({"choose": choose, 
                              "index" : range(len(env))})

import theano.tensor as T

with pm.Model() as bayesian_lerner_model:
    k  = pm.Normal("k", mu = 1, sigma = 0.5, testval = 0.6)
    k_ = pm.Deterministic('k_cap', pm.math.exp(k))
    v  = pm.GaussianRandomWalk("v", mu = 0.07, sigma = k_, testval = 0.05, shape = len(env))
    v_ = pm.Deterministic('v_cap', pm.math.exp(v))

    
    r0 = pm.Beta('r0', mu = 1, sigma = 1)
    step = T.as_tensor_variable(np.array(range(len(env))))
    def update_r(v_, step, r0):
        w = r0
        k = 1 / v_
        r1 = pm.Beta('r'+str(step), alpha=w*(k-2) + 1, beta=(1-w)*(k-2) + 1, testval = 0.5)
        return r1
    
    result, updates = theano.scan(fn = update_r,
                                  sequences = [v_,step],
                                  outputs_info = r0)

    y = pm.Bernoulli("y", p = result, observed = env)
    
    trace = pm.sample(init="adapt_diag")

Unfortunately, I still don’t understand how to use theano.scan() to create RVs. So there is some error.

MissingInputError                         Traceback (most recent call last)
<ipython-input-24-7137d10b5bb4> in <module>
     29     y = pm.Bernoulli("y", p = result, observed = env)
     30 
---> 31     trace = pm.sample(init="adapt_diag")

...

MissingInputError: Input 1 of the graph (indices start from 0), used to compute Elemwise{true_div,no_inplace}(TensorConstant{1}, v_cap[t]), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Hey! I am also in process of re-creating the volatility model. I did it a couple years ago using BUGS in R but as I am shifting to python/pymc3 I wanted to give it a go as well. This thread has been very helpful, so I will have a go and post here.

I understand close to nothing re theano.scan() but @ricardoV94 made this amazing tutorial on fitting iterative RL model which I think can be adopted to the volatility model - in case you are still stuck @dddd1007

1 Like

It may be because of my weak program ability in PyMC3 but it looks hard to implement an HMM there.

I rewrote this model by Turing.jl and Stan. It’s so easy to create RVs using for loop with those frameworks.

Now I can implement this model with higher performance. And I think you should try them.

1 Like

I am very interested to give Turing a try actually, so I just might.

maximum recursion depth exceeded while calling a Python object

So I think the for loop way may not a good way to archive my model.

The for loop is definitely sub-optimal on the PyMC3 side (too many variables to keep track of), but the error you are reporting seems to come from a Python issue/error in your code, not from a PyMC3 limitation.

Nevermind, found something like this here python - "Bracket nesting level exceeded maximum" when working with large dataset in PyMC3 - Stack Overflow

I had no idea about this sort of limitation on theano

Regarding Scan, you cannot use it to create new variables as in the code snippet you shared (you might in a future PyMC3 version though).

By the way, do you want to share the STAN code?

For HMMs in general, there are some more specialized libraries built on top of PyMC3 with custom Distributions and samplers such as GitHub - AmpersandTV/pymc3-hmm: Hidden Markov models in PyMC3 that might be easier work with.

1 Like