Sequence of Observed in a loop, how the log-likelihood are estimated? add them all?

I have one HMM model for multiple sequences. Each sequence has different length. so, I built pm in the following way:

all_seq # the observed data sequences
with pm.Model() as model:
    P = pm.Dirichlet('P_transition', a=np.ones((N_states,N_states)), shape=(N_states,N_states))
    Pe = pm.Dirichlet('P_emission', a=np.ones((N_states,N_labels)), shape=(N_states,N_labels))
    pos_states_seqs = []
    pos_emission_seqs = []
    for i in range(sequence_length):
        all_seq_i = all_seq[i]
        states_seq_i = HMMStates(...)
        emission_seq_i = HMMMultinomialEmissions(..., observed=all_seq_i)
        pos_states_seqs += [states_seq_i]
        pos_emission_seqs += [states_seq_i]

    print("find MAP")
    start = pm.find_MAP()
    print("run EM steps")
    step1 = pm.Metropolis(vars=[P, Pe]+pos_emission_seqs)
    step2 = pm.CategoricalGibbsMetropolis(vars=pos_states_seqs)
    trace = pm.sample(10000, start=start, step=[step1,step2])

My questions are:

  1. how observed data`s log-likelihood are computed in this looping way? add each sequence together?
  2. if you know any better inferences for this HMM problem?

Each observed seq in the range(sequence_length) will generate a HMMMultinomialEmissions randomvariable, and the logp of each of them will be added together in the end to generate one tensor as the model logp

Generally you want to avoid using latent discrete variable, and write it as a marginalized mixture model. Maybe there are some example in Stan that could be ported into PyMC3.

Thank you for a quick reply!

Yes, you are right. Current code is really slow. If I can not improve the speed, I might have to give up. I checked the marginalized gaussian mixture model. In my model, I use multinomial for both the emission and transition probabilities of HMM. I think what you suggest is in the following code, try to replace “Categorical” with “Mixture” model, right? Maybe you can give a more specific instruction to optimize it?

class HMMStatesN(pm.Categorical):
    Hidden Markov Model States
    P : tensor
        transition probability
        shape = (N_states,N_states)
    PA : tensor
         equilibrium probabilities
         shape = (N_states)
    def __init__(self,  N_states, PA=None, P=None,
                 *args, **kwargs):
        super(pm.Categorical, self).__init__(*args, **kwargs)
        self.P = P
        self.PA = PA
        self.k = N_states
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        P = self.P
        PA = self.PA
        PS = P[x[:-1]] # conditional probability of P(Y_t | Y_t-1 = x[:-1])
        x_i = x[1:]
        ou_like = pm.Categorical.dist(PS).logp(x_i)
        log_prob = pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)
        return log_prob

class HMMMultinomialEmissionsN(pm.Categorical):
    Hidden Markov Model Multinomial Emissions
    P : tensor
        multinomial choise probability
        shape = (N_states, N_station_type)

    states : tensor
         sequence of latent states
         shape = (N_states)
    def __init__(self, states, P=None,
                 *args, **kwargs):
        super(pm.Categorical, self).__init__(*args, **kwargs)
        self.P = P
        self.states = states
        self.mode = tt.cast(0,dtype='int64')

    def logp(self, x):
        P = self.P
        states = self.states
        PS = P[states]
        ou_like = pm.Categorical.dist(PS).logp(x)
        return tt.sum(ou_like)

Another more general question if you are willing to answer:
Do you think Pymc3 in my case would be compatible with the viterbi algorithm? is Pymc3 faster?

Something like that, Categorical logp is evaluating state i by indexing to p[i], using a mixture you will evaluate the observed being in all state, weighted by the mixture weight.

I never use or implement a viterbi algorithm, so it is hard to say. But a dedicated algorithm is usually faster, but more general algorithm is more flexible. Implementing a specific inference algorithm is quite some work, but we are always happy to guide you through if you want to try :slight_smile:

Thank you!
I am quite new to Pymc3. Let me figure out current model first.
Will consider your suggestion :ok_hand::ok_hand:

In my current code for categorical:

ou_like = pm.Categorical.dist(PS).logp(x_i)

It is actually just selecting the column whose indice is equal to x_i to get logp of categorical distribution, right?
Is it possible to replace it directly with a theano matrix operation?
If replaced, is it going to improve performance?

I did a small experiment, directly use theano is definite faster:

pm.Categorical.dist(P[[1, 0, 2, 1, 1]]).logp(np.array([1, 0, 2, 1, 1])).eval()

54.1 ms ± 983 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

theano.shared(P)[[1, 0, 2, 1, 1]][tt.arange(5), [0, 1, 1, 2, 0]].eval()

11.4 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

However, I would like to know if this implementation would cause issue for other parts of Pymc3, such as inference algorithms?

If it is a theano tensor within a logp calculation, there is usually no problem

Yes. I tried, it works. seems to estimate the correct values. Of course, the speed is still far away from what I expect.

I got stuck at how to implement the Marginalization for HMM. HMM has a chain of latent variables. It seems very complicated for me. Do you have any general suggestion?