Markov switching dynamic autoregression

Hi all,

I have been trying to make a Markov switching dynamic autoregression model within pymc3 but I am stuck as I am unfamiliar with theano. I am attaching a model similar in theme to what I want in statsmodels:
http://www.statsmodels.org/stable/examples/notebooks/generated/markov_regression.html?highlight=markov%20switching

I have created and run hidden markov models in pymc3 based upon similar models:

but the definition of time dependent transition matrix with thensors stumps me. Can someone please give an hint as to how to proceed for this?

Thanks,
Anurag

I have a regime switching model implementation you can have a look: https://github.com/junpenglao/Planet_Sakaar_Data_Science/blob/master/Ports/Regime-switching%20models%20in%20PyMC3.ipynb

3 Likes

Thanks junpenglao.

This was really helpful and I could build the non-theano version by myself but it was more difficult for me to understand how this was done with theano. So thank you.

My question is related to the modification of the code when the matrix P is not constant but changing with time as well. As a theano newbie, it is hard to see how that can be done with the scan function. Could you give a hint?

Thanks,
Anurag

Depending on how it is changing with time - you can either move P from non_sequences to outputs_info (it will get modify in ft_xit_dt and become one of the output, if the change over time is deterministic), or move it into sequences like Eta, which it is a same length theano array that you constructed outside of the scan.

1 Like

Thanks Junpenglao. That worked. Just for posterity, I am pasting the main modifications to the code here:

def ft_xit_dt(Eta, p_curr, ft, Xi):
P = tt.stack(tt.stack([p_curr[0], 1-p_curr[0]]), tt.stack([1-p_curr[1], p_curr[1]]))


Xi_ = tt.shape_padleft(Xi)
xit0 = tt.stack([Xi_, 1 - Xi_], axis=1).T
ft = tt.sum(tt.dot(xit0 * P, Eta))
Xi1 = (P[0, 0] * Xi + P[1, 0] * (1 - Xi)) * Eta[0] / ft
return [ft, Xi1]

and

with pm.Model() as m:
    # Transition matrix
    BoundedNormal = pm.Bound(pm.Normal, lower=-30, upper=0)
    BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
    logalpha = BoundedNormal('logalpha', mu=0, sd=10)
    beta =  BoundedNormal2('beta', mu=0, sd=0.5)
    p2 = pm.Beta('p2', alpha=10., beta=2.)
    p_time = []
    for i in range(1000):
        p1 = (1-tt.exp(logalpha + beta*i/20))
        p_curr = tt.stack([p2, p1])
        p_time.append(p_curr)
    p = tt.stack(p_time, axis=0)   
    # eta
    alpha1 = pm.Normal('alpha1', mu=0., sd=1)
    alpha2 = pm.HalfNormal('alpha2',sd=1)
    sigma = pm.HalfCauchy('sigma', beta=1., shape=2)
    eta1 = tt.zeros(yshared.shape)
    eta1 = tt.exp(pm.Normal.dist(mu=alpha1, sd=sigma[0]).logp(yshared))

    eta2 = tt.zeros_like(eta1)
    eta2 = tt.set_subtensor(eta2[0], 1)
    eta2 = tt.set_subtensor(eta2[1:], tt.exp(
        pm.Normal.dist(mu=alpha2 + yshared[:-1], sd=sigma[1]).logp(yshared[1:])))

    eta = tt.stack([eta1, eta2], axis=1)
    
    xi_init = pm.Beta('xi_init', alpha=2., beta=2.)
    ft_out = theano.shared(0.) # place holder
    ([ft, xi], updates) = theano.scan(ft_xit_dt,
                                      sequences=[eta, p],
                                      outputs_info=[ft_out, xi_init])

    Xi = pm.Deterministic('Xi', xi)
    # likelihood `target += sum(log(f))`
    pm.Potential('likelihood', tt.sum(tt.log(ft)))

This works for a single emission sequence pretty well.

However, I have the following issue:
I am trying to train of 300-400 sequences with a maximum of 300 or so emissions per sequence. I am able to build the model similarly but it takes a huge amount of time to start up and RAM requirements are huge too.

Is this expected? Is there anyway around this?

Thanks again,
Anurag

1 Like

It is somewhat expected because the use of for loop to construct p - I think for performance you would need to rewrite it into a theano.scan as well

Thanks Junpeng. It reduced memory footprint massively (by 60%ish) and even computational speed (by 1/3rd-ish) when I used theano.scan to build the p matrix. Now, I am wondering if there is a better way to do multiple chains (in the most general case, chain lengths can vary too) as well. So I am putting my code here:

def ft_xit_dt(Eta, p_curr, ft, Xi):
    P = tt.stack(tt.stack([p_curr[0], p_curr[1], p_curr[2]]),
                 tt.stack([p_curr[3], p_curr[4], p_curr[5]]),
                 tt.stack([p_curr[6], p_curr[7], p_curr[8]]))
    P = tt.set_subtensor(P[0, 0], 1 - P[0, 1] - P[0, 2])
    Xi1 = (P[0, 0] * Xi[0] + P[1, 0] * Xi[1] + P[2, 0] * Xi[2]) * Eta[0]
    Xi2 = (P[0, 1] * Xi[0] + P[1, 1] * Xi[1] + P[2, 1] * Xi[2]) * Eta[1]
    Xi3 = (P[0, 2] * Xi[0] + P[1, 2] * Xi[1] + P[2, 2] * Xi[2]) * Eta[2]
    ft = Xi1 + Xi2 + Xi3
    Xi_ = tt.stack([Xi1, Xi2, Xi3])/ft
    return [ft, Xi_]


def step(m_row, cumulative_product):
    return m_row * cumulative_product

and

    with pm.Model() as m:
    # Transition matrix
    BoundedNormal = pm.Bound(pm.Normal, lower=-30, upper=0)
    BoundedNormal2 = pm.Bound(pm.Normal, lower=0, upper=0.5)
    logalpha = BoundedNormal('logalpha', mu=0, sd=10)
    beta = BoundedNormal2('beta', mu=0, sd=0.5)

    BoundedNormal3 = pm.Bound(pm.Normal, lower=0, upper=0.1)
    p1 = BoundedNormal3('p1', sd=1)
    p2 = pm.Dirichlet('p2', a=np.array([1, 10, 1]))
    p3 = pm.Dirichlet('p3', a=np.array([1, 1, 10]))

    # eta
    alpha = pm.Normal('alpha', mu=0., sd=1, shape=3)
    sigma = pm.HalfCauchy('sigma', beta=1., shape=3)
    xi_init = np.array([0.7, 0.3, 0])
    # pm.Dirichlet('xi_init', a=np.array([3, 1, 0]))
    potential = tt.dscalar('potential')
    potential = 0
    for (i, curr_row) in df.iterrows():
        data = curr_row[curr_row.notna()].values
        yshared = theano.shared(data)
        n = data.shape[0]
        eta1 = tt.zeros_like(yshared)
        eta1 = tt.set_subtensor(eta1[0], tt.exp(pm.Normal.dist(
            mu=alpha[0], sd=sigma[0]).logp(yshared[0])))
        eta1 = tt.set_subtensor(eta1[1:],
                                tt.exp(pm.Normal.dist(
                                    mu=alpha[0] + yshared[:-1], sd=sigma[0])
                                       .logp(yshared[1:])))
        eta2 = tt.zeros_like(eta1)
        eta2 = tt.set_subtensor(eta2[0], tt.exp(pm.Normal.dist(
            mu=alpha[1], sd=sigma[1]).logp(yshared[0])))
        eta2 = tt.set_subtensor(eta2[1:], tt.exp(
            pm.Normal.dist(mu=alpha[1] + yshared[:-1],
                           sd=sigma[1]).logp(yshared[1:])))
        eta3 = tt.zeros_like(eta1)
        eta3 = tt.set_subtensor(eta3[0], tt.exp(pm.Normal.dist(
            mu=alpha[2], sd=sigma[1]).logp(yshared[0])))
        eta3 = tt.set_subtensor(eta3[1:], tt.exp(
            pm.Normal.dist(mu=alpha[2] + yshared[:-1],
                           sd=sigma[2]).logp(yshared[1:])))
        eta = tt.stack([eta1, eta2, eta3], axis=1)
        M1 = tt.stack([tt.ones_like(eta1)*tt.exp(beta/4),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1),
                       tt.ones_like(eta1)
                       ],
                      axis=1)
        M1 = tt.set_subtensor(M1[0, 2], tt.exp(logalpha))
        M1 = tt.set_subtensor(M1[0, 1], p1)
        M1 = tt.set_subtensor(M1[0, 3:6], p2[:])
        M1 = tt.set_subtensor(M1[0, 6:9], p3[:])
        s = np.ones(9)
        p, updates = theano.scan(fn=step,
                                 sequences=[M1],
                                 outputs_info=[s])
        ft_out = theano.shared(0.)
        ([ft, xi], updates) = theano.scan(ft_xit_dt,
                                          sequences=[eta, p],
                                          outputs_info=[ft_out, xi_init])

       
        potential = potential + tt.sum(tt.log(ft))

    pm.Potential('likelihood', potential)

Thanks,
Anurag

1 Like

in the most general case, chain lengths can vary too

This is a bit difficult but maybe you can have a batch version of the theano.scan, and pad all the length to the same size using NaNs? Then you can get rid of the for loop as well

How do you make pymc3 ignore the NaNs in this case?