I have one HMM model for multiple sequences. Each sequence has different length. so, I built pm in the following way:
all_seq # the observed data sequences
with pm.Model() as model:
P = pm.Dirichlet('P_transition', a=np.ones((N_states,N_states)), shape=(N_states,N_states))
Pe = pm.Dirichlet('P_emission', a=np.ones((N_states,N_labels)), shape=(N_states,N_labels))
....
pos_states_seqs = []
pos_emission_seqs = []
for i in range(sequence_length):
all_seq_i = all_seq[i]
states_seq_i = HMMStates(...)
emission_seq_i = HMMMultinomialEmissions(..., observed=all_seq_i)
pos_states_seqs += [states_seq_i]
pos_emission_seqs += [states_seq_i]
print("find MAP")
start = pm.find_MAP()
print("run EM steps")
step1 = pm.Metropolis(vars=[P, Pe]+pos_emission_seqs)
step2 = pm.CategoricalGibbsMetropolis(vars=pos_states_seqs)
trace = pm.sample(10000, start=start, step=[step1,step2])
My questions are:
- how observed data`s log-likelihood are computed in this looping way? add each sequence together?
- if you know any better inferences for this HMM problem?
Each observed seq in the range(sequence_length) will generate a HMMMultinomialEmissions
randomvariable, and the logp of each of them will be added together in the end to generate one tensor as the model logp
Generally you want to avoid using latent discrete variable, and write it as a marginalized mixture model. Maybe there are some example in Stan that could be ported into PyMC3.
Thank you for a quick reply!
Yes, you are right. Current code is really slow. If I can not improve the speed, I might have to give up. I checked the marginalized gaussian mixture model. In my model, I use multinomial for both the emission and transition probabilities of HMM. I think what you suggest is in the following code, try to replace “Categorical” with “Mixture” model, right? Maybe you can give a more specific instruction to optimize it?
class HMMStatesN(pm.Categorical):
"""
Hidden Markov Model States
Parameters
----------
P : tensor
transition probability
shape = (N_states,N_states)
PA : tensor
equilibrium probabilities
shape = (N_states)
"""
def __init__(self, N_states, PA=None, P=None,
*args, **kwargs):
super(pm.Categorical, self).__init__(*args, **kwargs)
self.P = P
self.PA = PA
self.k = N_states
self.mode = tt.cast(0,dtype='int64')
def logp(self, x):
P = self.P
PA = self.PA
PS = P[x[:-1]] # conditional probability of P(Y_t | Y_t-1 = x[:-1])
x_i = x[1:]
ou_like = pm.Categorical.dist(PS).logp(x_i)
log_prob = pm.Categorical.dist(PA).logp(x[0]) + tt.sum(ou_like)
return log_prob
class HMMMultinomialEmissionsN(pm.Categorical):
"""
Hidden Markov Model Multinomial Emissions
Parameters
----------
P : tensor
multinomial choise probability
shape = (N_states, N_station_type)
states : tensor
sequence of latent states
shape = (N_states)
"""
def __init__(self, states, P=None,
*args, **kwargs):
super(pm.Categorical, self).__init__(*args, **kwargs)
self.P = P
self.states = states
self.mode = tt.cast(0,dtype='int64')
def logp(self, x):
P = self.P
states = self.states
PS = P[states]
ou_like = pm.Categorical.dist(PS).logp(x)
return tt.sum(ou_like)
Another more general question if you are willing to answer:
Do you think Pymc3 in my case would be compatible with the viterbi algorithm? is Pymc3 faster?
Something like that, Categorical logp is evaluating state i by indexing to p[i]
, using a mixture you will evaluate the observed being in all state, weighted by the mixture weight.
I never use or implement a viterbi algorithm, so it is hard to say. But a dedicated algorithm is usually faster, but more general algorithm is more flexible. Implementing a specific inference algorithm is quite some work, but we are always happy to guide you through if you want to try
Thank you!
I am quite new to Pymc3. Let me figure out current model first.
Will consider your suggestion
In my current code for categorical:
ou_like = pm.Categorical.dist(PS).logp(x_i)
It is actually just selecting the column whose indice is equal to x_i to get logp of categorical distribution, right?
Is it possible to replace it directly with a theano matrix operation?
If replaced, is it going to improve performance?
I did a small experiment, directly use theano is definite faster:
%%timeit
pm.Categorical.dist(P[[1, 0, 2, 1, 1]]).logp(np.array([1, 0, 2, 1, 1])).eval()
54.1 ms ± 983 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
theano.shared(P)[[1, 0, 2, 1, 1]][tt.arange(5), [0, 1, 1, 2, 0]].eval()
11.4 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
However, I would like to know if this implementation would cause issue for other parts of Pymc3, such as inference algorithms?
If it is a theano tensor within a logp calculation, there is usually no problem
Yes. I tried, it works. seems to estimate the correct values. Of course, the speed is still far away from what I expect.
I got stuck at how to implement the Marginalization for HMM. HMM has a chain of latent variables. It seems very complicated for me. Do you have any general suggestion?