Breaking symmetries in Bayesian MDS

Thank you @gBokiau for the detailed reply!

For background, the research question is “how can I embed objects in a low-D space, where all I know about the objects is how (pair-wise) dissimilar they are from each other” and in practice the observations of pair-wise dissimilarity are Bernoulli trials where 1 is “subject thinks the two objects are the same” and 0 is “subject thinks the two objects are different”, and I have many trials for most pairs of objects. The model is basically obs[i,j,n] ~ Bernoulli(p[i,j]) for the nth (binary) observation of dissimilarity between objects i and j, and p[i,j] is some function (something that maps [0,Inf] to [0,1]) of the distance between i and j in the low-D space.

Regular MDS tries to move all the objects around in the low-D space until the pair-wise estimated distances are all as self-consistent as possible (called minimizing the strain), and typically assumes deterministic, continuous estimates of dissimilarity. Bayesian MDS (which has no standard implementation) allows for improvements like having really unbalanced observations (many more or less for some pairs i,j), priors on all distances or specific distances based on other information, and of course accommodating noisy observations (of which a Bernoulli trial is certainly one example), and even a prior on dimensionality itself.

I am starting with 2 dimensions, so indeed only rotation symmetry needs to be handled for now. You said that I should use “observed constraints” in the model definition in one point, and then later said that centering the prior seemed entirely equivalent, so I’m not sure which you are recommending, but concretely this would be either:

  • One specific point observed at (0, 0) for translation, another specific point at (0, whatever) for rotation.
    vs.
  • Prior for something (e.g. the center of mass) with mean (0,0) for translation, another prior for something else (e.g. the angle of the vector sum) with mean 0 for rotation.

Which do you think is better? And yes, only the distance between points (for all pairs of points) really matters in the end, and if needed I could always align the results from different chains before combining them.

Lastly, do you have a recommendation for the positivity constraint that eliminates reflection symmetries? This is a hard constraint and I was worried about the infinite gradient that this would imply.