Sequence of Observed in a loop, how the log-likelihood are estimated? add them all?

tongjiyiming · November 15, 2018, 4:06pm

I have one HMM model for multiple sequences. Each sequence has different length. so, I built pm in the following way:

all_seq # the observed data sequences
with pm.Model() as model:
    P = pm.Dirichlet('P_transition', a=np.ones((N_states,N_states)), shape=(N_states,N_states))
    Pe = pm.Dirichlet('P_emission', a=np.ones((N_states,N_labels)), shape=(N_states,N_labels))
    ....
    pos_states_seqs = []
    pos_emission_seqs = []
    for i in range(sequence_length):
        all_seq_i = all_seq[i]
        states_seq_i = HMMStates(...)
        emission_seq_i = HMMMultinomialEmissions(..., observed=all_seq_i)
        pos_states_seqs += [states_seq_i]
        pos_emission_seqs += [states_seq_i]

    print("find MAP")
    start = pm.find_MAP()
    print("run EM steps")
    step1 = pm.Metropolis(vars=[P, Pe]+pos_emission_seqs)
    step2 = pm.CategoricalGibbsMetropolis(vars=pos_states_seqs)
    trace = pm.sample(10000, start=start, step=[step1,step2])

My questions are:

how observed data`s log-likelihood are computed in this looping way? add each sequence together?
if you know any better inferences for this HMM problem?

junpenglao · November 15, 2018, 4:16pm

Each observed seq in the range(sequence_length) will generate a HMMMultinomialEmissions randomvariable, and the logp of each of them will be added together in the end to generate one tensor as the model logp

Generally you want to avoid using latent discrete variable, and write it as a marginalized mixture model. Maybe there are some example in Stan that could be ported into PyMC3.

tongjiyiming · November 15, 2018, 4:25pm

Thank you for a quick reply!

Yes, you are right. Current code is really slow. If I can not improve the speed, I might have to give up. I checked the marginalized gaussian mixture model. In my model, I use multinomial for both the emission and transition probabilities of HMM. I think what you suggest is in the following code, try to replace “Categorical” with “Mixture” model, right? Maybe you can give a more specific instruction to optimize it?

class HMMStatesN(pm.Categorical):
    """
    Hidden Markov Model States
    Parameters
    ----------
    P : tensor
        transition probability
        shape = (N_states,N_states)
        
    PA : tensor
         equilibrium probabilities
         shape = (N_states)
    
    """
    def __init__(self,  N_states, PA=None, P=None,
                 *args, **kwargs):
        super(pm.Categorical, self).__init__(*args, **kwargs)
        self.P = P
        self.PA = PA
        self.k = N_states
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        P = self.P
        PA = self.PA
        PS = P[x[:-1]] # conditional probability of P(Y_t | Y_t-1 = x[:-1])
        x_i = x[1:]
        ou_like = pm.Categorical.dist(PS).logp(x_i)
        log_prob = pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)
        return log_prob

class HMMMultinomialEmissionsN(pm.Categorical):
    """
    Hidden Markov Model Multinomial Emissions
    Parameters
    ----------
    P : tensor
        multinomial choise probability
        shape = (N_states, N_station_type)

    states : tensor
         sequence of latent states
         shape = (N_states)
    
    """
    def __init__(self, states, P=None,
                 *args, **kwargs):
        super(pm.Categorical, self).__init__(*args, **kwargs)
        self.P = P
        self.states = states
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        P = self.P
        states = self.states
        PS = P[states]
        ou_like = pm.Categorical.dist(PS).logp(x)
        return tt.sum(ou_like)

tongjiyiming · November 15, 2018, 4:45pm

Another more general question if you are willing to answer:
Do you think Pymc3 in my case would be compatible with the viterbi algorithm? is Pymc3 faster?

junpenglao · November 15, 2018, 5:01pm

Something like that, Categorical logp is evaluating state i by indexing to p[i], using a mixture you will evaluate the observed being in all state, weighted by the mixture weight.

I never use or implement a viterbi algorithm, so it is hard to say. But a dedicated algorithm is usually faster, but more general algorithm is more flexible. Implementing a specific inference algorithm is quite some work, but we are always happy to guide you through if you want to try

tongjiyiming · November 15, 2018, 5:19pm

Thank you!
I am quite new to Pymc3. Let me figure out current model first.
Will consider your suggestion

tongjiyiming · November 15, 2018, 6:13pm

In my current code for categorical:

ou_like = pm.Categorical.dist(PS).logp(x_i)

It is actually just selecting the column whose indice is equal to x_i to get logp of categorical distribution, right?
Is it possible to replace it directly with a theano matrix operation?
If replaced, is it going to improve performance?

tongjiyiming · November 16, 2018, 1:37am

I did a small experiment, directly use theano is definite faster:

%%timeit
pm.Categorical.dist(P[[1, 0, 2, 1, 1]]).logp(np.array([1, 0, 2, 1, 1])).eval()

54.1 ms ± 983 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
theano.shared(P)[[1, 0, 2, 1, 1]][tt.arange(5), [0, 1, 1, 2, 0]].eval()

11.4 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, I would like to know if this implementation would cause issue for other parts of Pymc3, such as inference algorithms?

junpenglao · November 16, 2018, 6:04am

If it is a theano tensor within a logp calculation, there is usually no problem

tongjiyiming · November 16, 2018, 4:36pm

Yes. I tried, it works. seems to estimate the correct values. Of course, the speed is still far away from what I expect.

I got stuck at how to implement the Marginalization for HMM. HMM has a chain of latent variables. It seems very complicated for me. Do you have any general suggestion?

Topic		Replies	Views
How to marginalized Hidden Markov Model with categorical? Questions	9	5661	December 19, 2018
HMMs with PyMC3: Complexity and point estimates Questions	1	786	October 11, 2019
Hidden Markov Model - Estimating Transition and Emission CPDs from multiple sequences - not working Questions	11	3838	December 3, 2020
Baum-Welch example in PYMC3 Questions	6	1486	July 15, 2020
Importing POMDP code in pymc3 to pymc - Code is running with erroneous results v5	5	42	April 9, 2025

Sequence of Observed in a loop, how the log-likelihood are estimated? add them all?

Related topics