Markov switching dynamic autoregression

anurag.sethi · February 7, 2020, 10:56pm

Hi all,

I have been trying to make a Markov switching dynamic autoregression model within pymc3 but I am stuck as I am unfamiliar with theano. I am attaching a model similar in theme to what I want in statsmodels:
http://www.statsmodels.org/stable/examples/notebooks/generated/markov_regression.html?highlight=markov%20switching

I have created and run hidden markov models in pymc3 based upon similar models:

github.com

hstrey/Hidden-Markov-Models-pymc3/blob/master/Multi-State HMM-GE.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import theano.tensor as tt\n",
    "import theano"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,

This file has been truncated. show original

but the definition of time dependent transition matrix with thensors stumps me. Can someone please give an hint as to how to proceed for this?

Thanks,
Anurag

junpenglao · February 8, 2020, 7:23am

I have a regime switching model implementation you can have a look: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/Ports/Regime-switching%20models%20in%20PyMC3.ipynb

anurag.sethi · February 13, 2020, 6:57am

Thanks junpenglao.

This was really helpful and I could build the non-theano version by myself but it was more difficult for me to understand how this was done with theano. So thank you.

My question is related to the modification of the code when the matrix P is not constant but changing with time as well. As a theano newbie, it is hard to see how that can be done with the scan function. Could you give a hint?

Thanks,
Anurag

junpenglao · February 13, 2020, 7:12am

Depending on how it is changing with time - you can either move P from non_sequences to outputs_info (it will get modify in ft_xit_dt and become one of the output, if the change over time is deterministic), or move it into sequences like Eta, which it is a same length theano array that you constructed outside of the scan.

anurag.sethi · February 18, 2020, 2:29pm

Thanks Junpenglao. That worked. Just for posterity, I am pasting the main modifications to the code here:

def ft_xit_dt(Eta, p_curr, ft, Xi):
P = tt.stack(tt.stack([p_curr[0], 1-p_curr[0]]), tt.stack([1-p_curr[1], p_curr[1]]))


Xi_ = tt.shape_padleft(Xi)
xit0 = tt.stack([Xi_, 1 - Xi_], axis=1).T
ft = tt.sum(tt.dot(xit0 * P, Eta))
Xi1 = (P[0, 0] * Xi + P[1, 0] * (1 - Xi)) * Eta[0] / ft
return [ft, Xi1]

and

with pm.Model() as m:
    # Transition matrix
    BoundedNormal = pm.Bound(pm.Normal, lower=-30, upper=0)
    BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
    logalpha = BoundedNormal('logalpha', mu=0, sd=10)
    beta =  BoundedNormal2('beta', mu=0, sd=0.5)
    p2 = pm.Beta('p2', alpha=10., beta=2.)
    p_time = []
    for i in range(1000):
        p1 = (1-tt.exp(logalpha + beta*i/20))
        p_curr = tt.stack([p2, p1])
        p_time.append(p_curr)
    p = tt.stack(p_time, axis=0)   
    # eta
    alpha1 = pm.Normal('alpha1', mu=0., sd=1)
    alpha2 = pm.HalfNormal('alpha2',sd=1)
    sigma = pm.HalfCauchy('sigma', beta=1., shape=2)
    eta1 = tt.zeros(yshared.shape)
    eta1 = tt.exp(pm.Normal.dist(mu=alpha1, sd=sigma[0]).logp(yshared))

    eta2 = tt.zeros_like(eta1)
    eta2 = tt.set_subtensor(eta2[0], 1)
    eta2 = tt.set_subtensor(eta2[1:], tt.exp(
        pm.Normal.dist(mu=alpha2 + yshared[:-1], sd=sigma[1]).logp(yshared[1:])))

    eta = tt.stack([eta1, eta2], axis=1)
    
    xi_init = pm.Beta('xi_init', alpha=2., beta=2.)
    ft_out = theano.shared(0.) # place holder
    ([ft, xi], updates) = theano.scan(ft_xit_dt,
                                      sequences=[eta, p],
                                      outputs_info=[ft_out, xi_init])

    Xi = pm.Deterministic('Xi', xi)
    # likelihood `target += sum(log(f))`
    pm.Potential('likelihood', tt.sum(tt.log(ft)))

This works for a single emission sequence pretty well.

However, I have the following issue:
I am trying to train of 300-400 sequences with a maximum of 300 or so emissions per sequence. I am able to build the model similarly but it takes a huge amount of time to start up and RAM requirements are huge too.

Is this expected? Is there anyway around this?

Thanks again,
Anurag

junpenglao · February 18, 2020, 3:01pm

It is somewhat expected because the use of for loop to construct p - I think for performance you would need to rewrite it into a theano.scan as well

anurag.sethi · February 21, 2020, 12:01am

Thanks Junpeng. It reduced memory footprint massively (by 60%ish) and even computational speed (by 1/3rd-ish) when I used theano.scan to build the p matrix. Now, I am wondering if there is a better way to do multiple chains (in the most general case, chain lengths can vary too) as well. So I am putting my code here:

def ft_xit_dt(Eta, p_curr, ft, Xi):
    P = tt.stack(tt.stack([p_curr[0], p_curr[1], p_curr[2]]),
                 tt.stack([p_curr[3], p_curr[4], p_curr[5]]),
                 tt.stack([p_curr[6], p_curr[7], p_curr[8]]))
    P = tt.set_subtensor(P[0, 0], 1 - P[0, 1] - P[0, 2])
    Xi1 = (P[0, 0] * Xi[0] + P[1, 0] * Xi[1] + P[2, 0] * Xi[2]) * Eta[0]
    Xi2 = (P[0, 1] * Xi[0] + P[1, 1] * Xi[1] + P[2, 1] * Xi[2]) * Eta[1]
    Xi3 = (P[0, 2] * Xi[0] + P[1, 2] * Xi[1] + P[2, 2] * Xi[2]) * Eta[2]
    ft = Xi1 + Xi2 + Xi3
    Xi_ = tt.stack([Xi1, Xi2, Xi3])/ft
    return [ft, Xi_]


def step(m_row, cumulative_product):
    return m_row * cumulative_product

and

    with pm.Model() as m:
    # Transition matrix
    BoundedNormal = pm.Bound(pm.Normal, lower=-30, upper=0)
    BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
    logalpha = BoundedNormal('logalpha', mu=0, sd=10)
    beta = BoundedNormal2('beta', mu=0, sd=0.5)

    BoundedNormal3 = pm.Bound(pm.Normal, lower=0, upper=0.1)
    p1 = BoundedNormal3('p1', sd=1)
    p2 = pm.Dirichlet('p2', a=np.array([1, 10, 1]))
    p3 = pm.Dirichlet('p3', a=np.array([1, 1, 10]))

    # eta
    alpha = pm.Normal('alpha', mu=0., sd=1, shape=3)
    sigma = pm.HalfCauchy('sigma', beta=1., shape=3)
    xi_init = np.array([0.7, 0.3, 0])
    # pm.Dirichlet('xi_init', a=np.array([3, 1, 0]))
    potential = tt.dscalar('potential')
    potential = 0
    for (i, curr_row) in df.iterrows():
        data = curr_row[curr_row.notna()].values
        yshared = theano.shared(data)
        n = data.shape[0]
        eta1 = tt.zeros_like(yshared)
        eta1 = tt.set_subtensor(eta1[0], tt.exp(pm.Normal.dist(
            mu=alpha[0], sd=sigma[0]).logp(yshared[0])))
        eta1 = tt.set_subtensor(eta1[1:],
                                tt.exp(pm.Normal.dist(
                                    mu=alpha[0] + yshared[:-1], sd=sigma[0])
                                       .logp(yshared[1:])))
        eta2 = tt.zeros_like(eta1)
        eta2 = tt.set_subtensor(eta2[0], tt.exp(pm.Normal.dist(
            mu=alpha[1], sd=sigma[1]).logp(yshared[0])))
        eta2 = tt.set_subtensor(eta2[1:], tt.exp(
            pm.Normal.dist(mu=alpha[1] + yshared[:-1],
                           sd=sigma[1]).logp(yshared[1:])))
        eta3 = tt.zeros_like(eta1)
        eta3 = tt.set_subtensor(eta3[0], tt.exp(pm.Normal.dist(
            mu=alpha[2], sd=sigma[1]).logp(yshared[0])))
        eta3 = tt.set_subtensor(eta3[1:], tt.exp(
            pm.Normal.dist(mu=alpha[2] + yshared[:-1],
                           sd=sigma[2]).logp(yshared[1:])))
        eta = tt.stack([eta1, eta2, eta3], axis=1)
        M1 = tt.stack([tt.ones_like(eta1)*tt.exp(beta/4),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1)
                       ],
                      axis=1)
        M1 = tt.set_subtensor(M1[0, 2], tt.exp(logalpha))
        M1 = tt.set_subtensor(M1[0, 1], p1)
        M1 = tt.set_subtensor(M1[0, 3:6], p2[:])
        M1 = tt.set_subtensor(M1[0, 6:9], p3[:])
        s = np.ones(9)
        p, updates = theano.scan(fn=step,
                                 sequences=[M1],
                                 outputs_info=[s])
        ft_out = theano.shared(0.)
        ([ft, xi], updates) = theano.scan(ft_xit_dt,
                                          sequences=[eta, p],
                                          outputs_info=[ft_out, xi_init])

       
        potential = potential + tt.sum(tt.log(ft))

    pm.Potential('likelihood', potential)

Thanks,
Anurag

junpenglao · February 21, 2020, 6:09am

in the most general case, chain lengths can vary too

This is a bit difficult but maybe you can have a batch version of the theano.scan, and pad all the length to the same size using NaNs? Then you can get rid of the for loop as well

anurag.sethi · February 21, 2020, 9:10pm

How do you make pymc3 ignore the NaNs in this case?

anurag.sethi · March 2, 2020, 12:59am

I am having problems with nested theano scan to do this? I have tried to debug this but I have not been too successful. So I was hoping you could help me:

def likelihood_per_individual(ix, eta_curr, p_curr, ft_prev, xi_prev):
xi_curr = eta_curr_per_ind * (tt.dot(p_curr_per_ind.T, xi_prev[0:2]))
ft_curr = tt.sum(xi_curr)
xi_curr = xi_curr/ft_curr
return [ft_curr, xi_curr]

and:

def likelihood(eta_curr, p_curr, ft, xi):
    ([ft_, Xi_], updates) = theano.scan(fn=likelihood_per_individual,
                                        sequences=tt.arange(eta_curr.shape[0]),
                                        non_sequences=[eta_curr, p_curr]
                                        outputs_info=[ft_k_out, xi_k_out])
    #likeli
    return [ft_, Xi_]

along with:

([Ft, Xi], updates) = theano.scan(likelihood,
                              sequences=[eta_, p_],
                              outputs_info=[ft_out, xi_out])
likelihood_ = theano.function(inputs=[eta_, p_, ft_out, xi_out], 
                         outputs=[Ft, Xi], updates=updates)

gives the following error message:

MissingInputError: Input 5 of the graph (indices start from 0), used to compute for{cpu,scan_fn}(Elemwise{minimum,no_inplace}.0, Subtensor{:int64:}.0, Subtensor{:int64:}.0, IncSubtensor{Set;:int64:}.0, IncSubtensor{Set;:int64:}.0, ft_k_out, xi_k_out), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

Backtrace when that variable is created:

File “/home/anurag/pymc3/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 2845, in _run_cell
return runner(coro)
File “/home/anurag/pymc3/lib/python3.6/site-packages/IPython/core/async_helpers.py”, line 67, in _pseudo_sync_runner
coro.send(None)
File “/home/anurag/pymc3/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 3020, in run_cell_async
interactivity=interactivity, compiler=compiler, result=result)
File “/home/anurag/pymc3/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 3185, in run_ast_nodes
if (yield from self.run_code(code, result)):
File “/home/anurag/pymc3/lib/python3.6/site-packages/IPython/core/interactiveshell.py”, line 3267, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 7, in
outputs_info=[ft_out, xi_out])
File “/home/anurag/pymc3/lib/python3.6/site-packages/theano/scan_module/scan.py”, line 774, in scan
condition, outputs, updates = scan_utils.get_updates_and_outputs(fn(*args))
File “”, line 6, in likelihood
ft_k_out = tt.dscalar(“ft_k_out”)

Could you please help me figure this out?

Thanks,
anurag

junpenglao · March 2, 2020, 7:37am

Do you have a notebook with data somewhere?

anurag.sethi · March 2, 2020, 2:48pm

It is committed here:

github.com

sethi7ik/pymc3_notebooks/blob/master/3state_1absorbing.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy.optimize import minimize\n",
    "from scipy.stats import norm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [

This file has been truncated. show original

anurag.sethi · March 4, 2020, 2:16pm

pinging @junpenglao. Any recommendations?

junpenglao · March 5, 2020, 7:18am

Sorry, I dont have much time this week, but from a quick glance, you should not pass a theano function to the tt.scan:

    ([ft_, Xi_], updates) = theano.scan(fn=likelihood_per_individual,
                                        sequences=tt.arange(eta_curr.shape[0]),
                                        non_sequences=[eta_curr, p_curr]
                                        outputs_info=[ft_k_out, xi_k_out])

instead, likelihood_per_individual should be a python function that call theano.scan internally.

anurag.sethi · March 8, 2020, 8:52am

Hi @junpenglao

I tried that at:

github.com

sethi7ik/pymc3_notebooks/blob/master/3state_1absorbing.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy.optimize import minimize\n",
    "from scipy.stats import norm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [

This file has been truncated. show original

But it is still running into errors. It is related to updates from inner scan to outer scan but I cannot understand the error message from theano well enough to debug it. What would you recommend based on the equivalent numpy version of the function in cells 12 through 17?

Thanks again,
Anurag

junpenglao · March 8, 2020, 12:11pm

Hi Anurag,
In general, the key of turning for-loop into theano.scan (or tf.while_loop in TensorFlow) is to keep in mind of the functional form - you need to be able to express the body of the for loop into one single function. We start with looking at the numpy version:

eta0 = np.random.rand(100, 1000, 3)

def ftfunc(P_curr, xi, eta):
    xi2 = eta * (np.dot(P_curr.T, xi[0:2]))
    return xi2

p = np.zeros((100, 1000, 2, 3))
for i in range(100):
    for j in range(1000):
        p[i, j, 0, 0] = 0.95 - np.exp(-6) * np.exp(0.05*j/20)
        p[i, j, 0, 1] = 0.05
        p[i, j, 0, 2] = np.exp(-6) * np.exp(0.05*j/20)
        p[i, j, 1, 0] = 0.01
        p[i, j, 1, 1] = 0.94
        p[i, j, 1, 2] = 0.05

ft2 = np.zeros((100, 1000))
xi2 = np.zeros((100, 1000, 3))


for i in range(100):
    xi2[i, 0] = ftfunc(p[i, 0], np.asarray([1, 0, 0]), eta0[i, 0])

ft2[:, 0] = xi2[:, 0].sum(axis=1)

xi2[:, 0, 0] = xi2[:, 0, 0]/ft2[:, 0]
xi2[:, 0, 1] = xi2[:, 0, 1]/ft2[:, 0]
xi2[:, 0, 2] = xi2[:, 0, 2]/ft2[:, 0]


for i in range(100):
    for j in range(1, 1000):
        P_curr = p[i, j]
        Eta = eta0[i, j]
        Xi_ = xi2[i, j - 1]
        xi2[i, j] = ftfunc(P_curr, Xi_, Eta)
        ft2[:, j] = xi2[:, j].sum(axis=1)
        xi2[:, j, 0] = xi2[:, j, 0]/ft2[:, j]
        xi2[:, j, 1] = xi2[:, j, 1]/ft2[:, j]
        xi2[:, j, 2] = xi2[:, j, 2]/ft2[:, j]

plt.plot(xi2[:, 99, :]);

Something immediately jump out to me is that:

the generation of the time variate transition matrix P and initial latent state xi2 should both be “fold” into the nested for-loop. Likely we dont need to construct P explicitly,
ordering of the nested for-loop: since we can consider we have 100 copy of the time series, and time point is something that have to use theano.scan, best is to vectorized ftfunc, and only keep one loop that would be translate into scan:

ft2_new = np.zeros((1000, 100, 1))
xi2_new = np.zeros((1000, 100, 3))

eta0_ = np.transpose(eta0, [1, 0, -1])

def generate_p(j):
    p = np.zeros((2, 3))
    p[0, 0] = 0.95 - np.exp(-6) * np.exp(0.05*j/20)
    p[0, 1] = 0.05
    p[0, 2] = np.exp(-6) * np.exp(0.05*j/20)
    p[1, 0] = 0.01
    p[1, 1] = 0.94
    p[1, 2] = 0.05
    return p

def ftfunc_batch(P_curr, xi, eta):
    # shape changes here is ((3, 2), (j, 1, 2)) --> (j, 3, 2) --> (j, 3)
    xi2 = eta * np.sum(P_curr.T * xi[:, np.newaxis, :2], axis=-1)
    return xi2

# run once so we have the inital state
p_curr = generate_p(0)
xi2_j = ftfunc_batch(p_curr, np.asarray([[1, 0, 0]]), eta0_[0])
ft2_new[0] = np.sum(xi2_j, axis=-1, keepdims=True)
xi2_new[0] = xi2_j / ft2_new[0]

for j in range(1, 1000):
    last_states = generate_p(j), xi2_new[j-1], eta0_[j]
    xi_new = ftfunc_batch(*last_states)
    ft2_new[j] = np.sum(xi_new, axis=-1, keepdims=True)
    xi2_new[j] = xi_new / ft2_new[j]

np.testing.assert_array_almost_equal(xi2_new[j], xi2[:, j])
plt.plot(xi2_new[99]);

Now, recall that the function signature of theano.scan is:

The general order of function parameters to fn is:

sequences (if any), prior result(s) (if needed), non-sequences (if any)

Thus, we can further restructure the for-loop into:

ft2_new = np.zeros((1000, 100, 1))
xi2_new = np.zeros((1000, 100, 3))

eta0_ = np.transpose(eta0, [1, 0, -1])

def generate_p(j):
    p = np.zeros((2, 3))
    p[0, 0] = 0.95 - np.exp(-6) * np.exp(0.05*j/20)
    p[0, 1] = 0.05
    p[0, 2] = np.exp(-6) * np.exp(0.05*j/20)
    p[1, 0] = 0.01
    p[1, 1] = 0.94
    p[1, 2] = 0.05
    return p

def ftfunc_batch(eta, P_curr, xi, fi):
    # eta, P_curr is sequence
    # xi, fi is previous result
    # shape changes here is ((3, 2), (j, 1, 2)) --> (j, 3, 2) --> (j, 3)
    xi2 = eta * np.sum(P_curr.T * xi[:, np.newaxis, :2], axis=-1)
    fi2 = np.sum(xi2, axis=-1, keepdims=True)
    return xi2 / fi2, fi2

# run once so we have the inital state
p_curr = generate_p(0)
xi2_new[0], ft2_new[0] = ftfunc_batch(
    eta0_[0], p_curr, np.asarray([[1, 0, 0]]), None)


for j in range(1, 1000):
    sequence_input = eta0_[j], generate_p(j)
    prior_results = xi2_new[j-1], ft2_new[j-1]
    
    xi2_new[j], ft2_new[j] = ftfunc_batch(*sequence_input, *prior_results)

np.testing.assert_array_almost_equal(xi2_new[j], xi2[:, j])

Hope this gives you some inspiration to solve your problem.

Topic		Replies	Views
Theano.scan and pm.Model() are strangely incompatible Questions	9	1844	May 21, 2019
Using theano scan with pymc3 Questions	3	890	December 4, 2020
MA model for time series prediction Questions	7	1678	February 1, 2019
Write theano.scan+theano.function with multinomial sampling inside as a pymc3 model Questions theano	2	701	May 22, 2019
Memory problems with nested theano.scan Questions theano	1	885	August 2, 2018

Markov switching dynamic autoregression

Related topics