Hi!
I’m trying to get posterior samples from a Hidden Markov Model (HMM) with multiple sequences of observations of different lengths. I was able to get posterior samples using observations of the same length, but now I want to expand my problem and use observations of different lengths.
My idea was to obtain separate posterior samples as InferenceData from each set of observations with the same length and then use arviz.concat to combine the multiple traces.
To build the model I used the library pymc3-hmm that has fully implemented distributions and step methods that we can use in PyMC3 models. Here is my code
N_states = 3
with pm.Model() as model:
observations = pm.Data('data', data_sequences[0])
Pt = pm.Dirichlet("p_transition", np.ones( (N_states, N_states) ), shape=(N_states, N_states))
P0 = pm.Dirichlet("p_init", np.ones((N_states,)), shape=(N_states,))
mu1 = pm.Normal('mu1', mu=-30, sigma=5 )
mu2 = pm.Normal('mu2', mu=-15, sigma=5 )
mu3 = pm.Normal('mu3', mu=-5, sigma=5 )
mu = tt.stack( [mu1,mu2,mu3] )
sigma = pm.HalfNormal("sigma", sigma=5, shape=(N_states,) )
comp_dists = [pm.Normal.dist(mu = mu[i], sigma = sigma[i]) for i in range(0, N_states)]
Z_rv = DiscreteMarkovChain("Z_t", tt.shape_padleft(Pt), P0, shape = tt.shape(observations)[-1].eval() )
X_rv = SwitchingProcess("X_t", comp_dists, Z_rv, observed = observations)
The code I am using to obtain the samples from the model is represented below.
traces = []
for data_vals in data_sequences:
with model:
pm.set_data({'data': data_vals})
traces.append(pm.sample(return_inferencedata=True, chains=2))
However, I get the following error
IndexError: boolean index did not match indexed array along dimension 0; dimension is 21 but corresponding boolean dimension is 12
Apply node that caused the error: AdvancedSubtensor(<theano.tensor.extra_ops.BroadcastTo object at 0x7f0612d37d10>.0, Elemwise{eq,no_inplace}.0)
Toposort index: 70
Inputs types: [TensorType(float64, vector), TensorType(bool, vector)]
Inputs shapes: [(21,), (12,)]
Inputs strides: [(0,), (1,)]
Inputs values: [‘not shown’, ‘not shown’]
Outputs clients: …
Can anyone help me?
Thank you in advance!