Implementing late-entering series in a PyMC state-space model

pritam · May 15, 2026, 5:47am

Hi all,

I am trying to implement a multivariate state-space model in PyMC for compositional time-series data.

Suppose I have multiple related time series observed over a common time index, but some series only begin partway through the sample. For example:

Series A and B are observed from t=1
Series C only starts at t=50

Before t=50, Series C is genuinely unobserved/nonexistent.

The paper I am reading proposes the following:

e_{it} = \begin{cases} y_{it} - \hat{y}_{it}, & \text{if series } i \text{ is observed at time } t\\ 0 & \text{otherwise} \end{cases}

the ‘errors’ for these run-in periods are forced to zero using the formula

The idea is that the state-space recursion can still be written using the full dimensionality of y_{t}, while pre-entry observations remain np.nan.

What I am unsure about is how to handle these missing/pre-entry error components in PyMC so that they do not update the state and do not enter the observed-data likelihood/objective..

What would be the cleanest way to implement this in PyMC? Any guidance or example patterns would be greatly appreciated.

There is a public implementation available on GitHub which is not built on PyMC. Admittedly I haven’t spent a lot of time with this problem and also know very little about PyMC. But looking at the codebase I am trying find a similar alternative for this code in PyMC.

ricardoV94 · May 15, 2026, 6:23am

Have you tried pymc-extras statespace module? it handles missing data out of the box.

Links to some uses: pymc-extras/notebooks/Structural Timeseries Modeling.ipynb at main · pymc-devs/pymc-extras · GitHub

They don’t have missing data from a quick skim but it’s just a question of passing nan

pritam · May 15, 2026, 7:21am

it’s just a question of passing nan

And making sure they are ignored during likelihood calculation instead of imputing them. How do I make sure they are ignored and not imputed? I did find this comment from another thread.

If you look here they are filtering out the rows and columns of the covariance matrix wherever nan values are there. I was wondering if I can reproduce this in PyMC and how.

ricardoV94 · May 15, 2026, 7:33am

What’s the concern if they are imputed?

ricardoV94 · May 15, 2026, 7:35am

Anyway you can do a separate series for each starting when the series really starts and just share the structural parameters the same way you would in a same length series? you don’t need a single likelihood

jessegrabowski · May 15, 2026, 12:17pm

This is precisely what we do here

bob-carpenter · June 1, 2026, 6:09pm

I think I’m just repeating what @ricardoV94 said above in more words.

With PyMC, you can just code series C starting at t = 50 without any dummy entries or zero errors. You can even offset indices by 50 so you don’t have to pad data with NaN values.

If before t = 50, series C is genuinely non-existent rather than unknown, it doesn’t make sense to write the state-space recursion in (A, B, C) for t in 0:50 or to impute values of it for the missing data times. If values for t in 0:50 are just unobserved in series C, you can impute the first 50 values without playing any games with the errors and then just throw away the imputations if you don’t care about them. You can also add some weakly informative priors to the imputed values.

Topic		Replies	Views
`pymc-experimental` now includes state spaces models! News development , time_series , state_space	3	4119	August 29, 2023
Handling missing values in predictor when outcome is a Multivariate Normal distribution v5	7	212	October 25, 2024
What does the hierarchical model look like when having missing in observed? Questions	12	1199	October 31, 2018
Missing values in a model? Questions	12	4834	November 7, 2018
Handling missing data in linear regression on timeseries data Questions	1	1807	July 2, 2020

Implementing late-entering series in a PyMC state-space model

Related topics