What's the best way to implement an autoregressive moving average of a time series with a variable lag and window size?

You can handle multiple lags like is done in some stick-breaking processes. Assume a reasonable max number of lags and “disable the unused ones” with a masking operation.

Something like that is done here for another kind of models: Modeling temporal data with an unknown number of changepoints | Christopher Krapu

PyMC doesn’t really allow sampling of parameters that vary in shape, VAR or otherwise.

Whether any of this will be feasible to parametrize I have no idea.