Time series implementation questions

hi,

I am aware of the built-in time series distributions, but have a question how to accomplish something similar to the follow Stan program:

parameters {
   real a;
   vector[100] x_t;
}
model {
  x_t[1:] ~ normal(a * x_t[:99], 0.1);
  // make predictions with x_t
}

In words, x_t is a lag 1 autoregressive time series implemented in a “centered” form. The equivalent non-centered form would be

parameters {
  real a
  vector[100] z_t;
}
model {
  vector[100] x_t;
  z_t ~ std_normal();
  x_t[1] = z_t[1];
  for (i in 1:99) x_t[i+1] = a * x_t[i] + 0.1 * z_t[i+1];
  // make predictions with x_t
}

My understanding is that the pymc distributions now all implement the second “non-centered” form which is usually better for MCMC sampling because it decorrelates the parameters. But, I wanted to know how I could implement the first variant by hand for performance reasons? I wasn’t able to figure this out without resorting to pm.Potential. Thanks in advance!

There’s nothing wrong with using a pm.Potential if it achieves your goal. It’s a simple way to add an arbitrary logp term to a model. The reason you may want to use a RandomVariable (or the helper wrapper DensityDist) instead of a Potential is if you want to do prior/posterior predictive or represent said RandomVariable nicely in a model graph.

Scan is the way to represent arbitrary looping in PyMC, and there’s nothing wrong with that either. Sometimes you can cook up a vectorized version that does not require looping, but that’s a separate matter.

I think all timeseries in PyMC are implemented in a centered parametrization. That is, if they were unobserved, the sampler would have to propose absolute values and not relative / innovation values. Most of the times, the timeseries variables are observed, so this is not a concern. There is an issue for addressing this for RandomWalk (which do use a vectorization trick to avoid scanning btw): Implement DiffTransform for RandomWalk distributions · Issue #6098 · pymc-devs/pymc · GitHub

1 Like

Thanks for the remarks, I don’t understand well enough yet how the scan with rng works in the pymc source. I was trying to do without the potential because the Muse package doesn’t support it, and I wanted to compare the inference algorithms for centered and non-centered cases. I’ll keep at it and come back later.

edit For instance this is clearly the centered parameterization. But the scan implementation above in the rv_op method seems redundant and potentially non-centered. The posterior geometries between centered and non-centered are different, hence my curiousity.

Oh I didn’t mean how the logp is implemented when I said centered. So I probably misunderstood your point. I meant that the logp function expects absolute “concrete” values, not innovations as inputs (which is all that would matter for sampler efficiency).

You probably meant something like whether internally we compute the logpdf inside a Scan graph by traversing the values and accumulating individual logpdf terms or whether we do a single vectorized call (directly, if that’s possible, or after “differencing” the values, whatever that means for the given timeseries)

The Scan in the rv_op is only concerned with generating random draws. You shouldn’t need to worry about RNGs if you want to compute the logpdf using (another) Scan inside the logp function. In that case you would probably be calling pm.logp(Normal.dist(some_mu, some_sigma), some_value) inside the Scan body, which will return a tensor expression without any RNGs whatsoever (it’s just the normal_logpdf expression).

The way you compute the logp, does not affect sampler performance directly, the geometry is exactly the same, because the inputs and outputs are exactly the same. You may, however, find out that the Scan implementation (or it’s gradient) has a different speed or numerical precision than the vectorized implementation.

thanks, that makes a lot more sense now!

also makes sense, I get the use of scan as a more general way than vectorization to keep the computational graph compact (it’s strange that Stan never got a scan).

I think we agree on this: for the current implementation, the sampler (or optimizer or whatever) is moving in the space of the values of the time series rv, not the space of the innovations.

That means there could be an alternative implementation which distinguishes the innovations as the parameter and does a scan to compute the concrete values of the time series rv, which would be non-centered, and in principle better for effective sample size / second (with HMC at least). We also observed that the stochastic Heun method accelerates convergence despite require 2x function calls, so I was aiming for that. I think I see how to do it now with scan.

1 Like

Definitely. Usually we achieve that via variable transforms, so that the sampler proposes values in an “unconstrained” (or easier) space and we convert those to the “constrained” space (the same space where you “observe” or obtain random draws), adding any jacobian terms to make sure the user-defined prior is respected.

However, this approach is not great for timeseries, because most of the times it means doubling the amount of work: first “adding the innovations” when going from unconstrained to constrained space, and then undoing this work to get back to the “innovations”, which we need to actually compute the logpdf. In that issue I linked, it’s a simple case of differencing a cumsum of values, and it should be easy to teach Aesara (our computational backend) to optimize away that redundant back and forth, but for arbitrary Scan’s that’s obviously not as easy.

I have considered having some way of providing both formats, depending on whether a timeseries (or any distribution) is observed or latent, but I haven’t figured out how that could be done yet.

Stan allows for arbitrary for loops (in their specialized domain language). Scan is a constrained approximation to for loops (and while loops), because Python for loops allow for too much “magic”. Stan never needed a Scan in that sense.

I have vectorized too many nested Stan loops to agree with that :smiley:

For some loop heavy models, the memory traffic for the AD graph dominates calculation, so scan is a good api choice imo.

1 Like

And I have vectorized too many Scan loops to agree with myself :smiley:

1 Like

By the way, if you are interested in timeseries in PyMC, we have access to a nifty new library Aeppl that among other things, tries to infer the logp of arbitrary Scans that represent the generative process of a timeseries (i.e., the random draws / the graph implemented in rv_op): GitHub - aesara-devs/aeppl: Tools for an Aesara-based PPL.

I tried to use that when refactoring the AR distribution and it did work. We ended up using the manual vectorized implementation because the generated Scan was too slow for now (something that the folks at Aesara are trying to iron out). Anyway you might like to try it out. I have a notebook showing it within a PyMC model (warning: it has some black magic to glue things together that may no longer be needed / working):

2 Likes

Lots of interesting bits in the notebook thanks, I’ll look closer at aeppl as well.