ZeroSumNormal - references?

Is there a good article to read to understand the math behind the implementation of PYMC’s ZeroSumNormal? I could not find a citation in the code or docs.

Thanks!

The best citation is this random discussion thread on the numpyro github page of all places.

Bug @aseyboldt to write more papers.

2 Likes

Thanks! The discussion there was quite helpful! And I look forward to the paper… nudge nudge @aseyboldt

1 Like

There’s also the section on degenerate multivariate normals from wikipedia that’s quite helpful for understanding how to compute the logp

1 Like

Stan implemented @aseyboltaseybolt’s implementation as constrained parameter type ‘zero_sum_vector’ and I wrote a case study which goes through the math; the Stan reference manual also goes over this: Constraint Transforms. here’s the case-study: The Sum-to-Zero Constraint in Stan

1 Like

Ah, this is very useful! I am also a Stan user so this will help.

This may also be helpful:

Normal distributed random variables with constraint? - Cross Validated

See the response and the reference in there to Kwong and Iglewicz 1996

1 Like

Indeed it is helpful, thanks!

1 Like

This is not how PyMC and Stan are defining sum-to-zero variables. We’re instead defining a new density. The easiest way to think about it is in terms of a change-of-variables. To have an N-vector that sums to zero, you only have N - 1 degrees of freedom—the last value has to be equal to the negative sum of the previous values.

A naive way to enforce this constraint is just this way. Use N - 1 free unconstrained variables and define the N-th variable to be the negation of the sum of the first N-1 values. Because it’s a linear constraint, the Jacobian is constant, so can be ignored for the purposes of sampling. The unnormalized density is

p(x) = \left(\prod_{n=1}^{N-1} \textrm{normal}(x_n \mid 0, \sigma)\right) \cdot \textrm{normal}(-\textrm{sum}(x_1, ..., x_{N-1}) \mid 0, \sigma).

The sampler only works over variables x_1,\ldots, x_{N-1}—the variable x_N is defined as a transform and plays a role in the density This density is not something that has marginal scale \sigma—the actual scales are slightly smaller due to the constraint (by a factor of \sqrt{(N - 1) / N}).

Despite being what we recommended for years for Stan, this turns out to not be a great way to do this transform geometrically. @aseyboldt and @spinkney figured out how to do this much better using the first half of the isometric log ratio transform. The isometric part is that it keeps the variables relatively independent in the unconstrained space where we sample.

The full derivation is is available in the Stan Reference Manual section on sum-to-zero constraints. It has worked way better than the naive approach in all of the examples where we’ve tried it. Thanks, @aseyboldt and @spinkney :slight_smile: !

5 Likes

Thanks @bob-carpenter!

Some clarification on:

Both PyMC and Stan use orthogonal transformations but PyMC uses Householder reflectors while Stan uses Helmert transforms (which underpin the isometric log ratio transform). Any orthogonal transformation matrix works here and we can convert Helmert to Householder and back with a suitable permutation matrix.

I think the Stan and PyMC versions will have similar speed and memory costs for the vector case. The matrix case was hand crafted in Stan to only do one double loop without forming any additional matrices. I think the householder one has a bit more computation but still doesn’t need to form additional matrices. The generalization to dimensions > 2 is already in PyMC while not in Stan (yet).

2 Likes

Thanks!

1 Like