Using DiscreteMarkovChain with batch dimensions and data of different lengths

Hi. I’m trying to model experimental data for multiple participants using a hidden Markov model based on pymc-extras’ DiscreteMarkovChain. I have a working model which uses a for loop to create a DiscreteMarkovChain distribution, emission matrix, and observed data for each participant, but it seems like quite an unweildly solution. I’d like to simply use batched dimensions like I would in other models, but each participant has seen a different number of trials, which means that each has a different length Markov chain.

The only solution I’ve been able to think of is to pad each vector up to the length of the longest, but the parameter I’m interested in is the transition probability, and the padded values would alter the transition probabilities (by either transitioning more or less than the participant themselves). Is there an elegant way to solve this problem, or should I stick with my for loop?

For this you can define your own distribution with CustomDist.

def dist(init_state, transition_p, end_t, size):
    def step(t, state_tm1, transition_p, end_t):
        state_t = pm.Categorical.dist(p=transition_p[state_tm1])

        # If t >= end_t, we encode as -1
        # We need to use a DiracDelta for PyMC to accept the logp from a `where`
        end_state = pm.DiracDelta.dist(-1, shape=state_t.shape)
        state_t = pt.where(t >= end_t, end_state, state_t)

        return state_t, pm.pytensorf.collect_default_updates(state_t)

    states, _ = pytensor.scan(
        fn=step,
        sequences=[pt.arange(end_t.max())],
        outputs_info=[init_state],
        non_sequences=[transition_p, end_t],
        strict=True,
    )
    return states


with pm.Model() as m:
    transition_p = pm.Dirichlet("transition_p", a=np.ones(3), shape=(3, 3))  # 3x3 transition matrix

    init_state = pt.as_tensor([0, 0, 0, 0, 0])  # Could be a prior
    end_t = pt.as_tensor([10, 15, 20, 12, 8])  # Observed 10, 15, 20, ... trials, respectively
    max_t = end_t.max()

    y = pm.CustomDist(
        "y",
        init_state,
        transition_p,
        end_t,
        dist=dist,
        shape=(end_t.shape[0], max_t),
    )

I assumed data was padded with -1, but you can do whatever encoding you want.

Colab notebook with full code and parameter recovery: Google Colab

This model seems to run fast with the numba backend (bleeding versions of pymc/pytensor) but whether it scales to your data or not I don’t know. Let me know if you find this approach to be much slower than the “wrong” one using the DiscreteMarkovChain from pymc-extras

You can read more about using CustomDist to define timeseries: Time Series Models Derived From a Generative Graph — PyMC example gallery

Thank you so much!