Observations of different lengths (Hidden Markov Model)

j_catulo · May 26, 2022, 11:57am

Hi!

I’m trying to get posterior samples from a Hidden Markov Model (HMM) with multiple sequences of observations of different lengths. I was able to get posterior samples using observations of the same length, but now I want to expand my problem and use observations of different lengths.

My idea was to obtain separate posterior samples as InferenceData from each set of observations with the same length and then use arviz.concat to combine the multiple traces.

To build the model I used the library pymc3-hmm that has fully implemented distributions and step methods that we can use in PyMC3 models. Here is my code

N_states = 3

with pm.Model() as model:

    observations = pm.Data('data', data_sequences[0])

    Pt = pm.Dirichlet("p_transition", np.ones( (N_states, N_states) ), shape=(N_states, N_states))
    P0 = pm.Dirichlet("p_init", np.ones((N_states,)), shape=(N_states,))

    mu1 = pm.Normal('mu1', mu=-30, sigma=5 )
    mu2 = pm.Normal('mu2', mu=-15, sigma=5 )
    mu3 = pm.Normal('mu3', mu=-5, sigma=5 )

    mu = tt.stack( [mu1,mu2,mu3] )

    sigma = pm.HalfNormal("sigma", sigma=5, shape=(N_states,) )

    comp_dists = [pm.Normal.dist(mu = mu[i], sigma = sigma[i]) for i in range(0, N_states)]

    Z_rv = DiscreteMarkovChain("Z_t", tt.shape_padleft(Pt), P0, shape = tt.shape(observations)[-1].eval() )
    X_rv = SwitchingProcess("X_t", comp_dists, Z_rv, observed = observations)

The code I am using to obtain the samples from the model is represented below.

traces = []
for data_vals in data_sequences:
     with model:
         pm.set_data({'data': data_vals})
         traces.append(pm.sample(return_inferencedata=True, chains=2))

However, I get the following error

IndexError: boolean index did not match indexed array along dimension 0; dimension is 21 but corresponding boolean dimension is 12
Apply node that caused the error: AdvancedSubtensor(<theano.tensor.extra_ops.BroadcastTo object at 0x7f0612d37d10>.0, Elemwise{eq,no_inplace}.0)
Toposort index: 70
Inputs types: [TensorType(float64, vector), TensorType(bool, vector)]
Inputs shapes: [(21,), (12,)]
Inputs strides: [(0,), (1,)]
Inputs values: [‘not shown’, ‘not shown’]
Outputs clients: …

Can anyone help me?

Thank you in advance!

ricardoV94 · May 26, 2022, 1:18pm

Can you share more details about the model? It seems like there is another variable there that is not compatible with the updated shape of data

j_catulo · May 26, 2022, 2:28pm

Thank you for your time!

I already edited my post and added more information. I tried to simplify the model as much as I could, in order to be more easy to read (so, probably the priors do not make sense, but it is not relevant for this question).

Thiago_Neto · May 31, 2022, 6:39pm

Hello, @j_catulo.

I am facing a similar problem. Were you able to understand where the error came from?

If so, could you please share how you solved it?

Thank you

j_catulo · June 1, 2022, 5:09pm

Hi!

Sorry I was not able to implement it without errors. I am still using sequences with the same length to obtain the posterior samples.

Topic		Replies	Views
Hidden Markov Model - Predicting the next observation(s) of unseen sequences version agnostic	8	1907	May 31, 2022
Multiple observations sharing priors and likelihood model Questions	3	2231	March 13, 2018
Sequence of Observed in a loop, how the log-likelihood are estimated? add them all? Questions	9	1752	November 16, 2018
Using DiscreteMarkovChain with batch dimensions and data of different lengths version agnostic	2	58	March 4, 2025
Hidden Markov Model - Estimating Transition and Emission CPDs from multiple sequences - not working Questions	11	3837	December 3, 2020

Observations of different lengths (Hidden Markov Model)

Related topics