Identifiable up to rearrangement

Can NUTS help me to find a set of parameters with largest likelihood if my model is identifiable only up to rearrangement?

One of the parameter is a matrix \mathcal{A}. Let’s say matrix A_{1} is a possible value of \mathcal{A}.

I can form a new matrix A_{2} by the following two steps:

  1. swap i^{\text{th}} and j^{\text{th}} row of A_{1} to form B
  2. swap i^{\text{th}} and j^{\text{th}} column of B to form A_{2}

Likelihood of A_{1} is same as likelihood of A_{2}.

I’m using a Dirichlet prior for \mathcal{A}.

Something like transform.ordered might work. But I need to sort both row and columns by the ordering of row sums.

Nope, most likely NUTS will try to do sample all the space and failed, see eg https://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html

One solution I can think of is to port the permutation bijector from TFP into a transformer in pymc3.


One solution

Is the following post-processing of the trace enough to fix this problem?

For each value of matrix \mathcal{A} in the trace,

  1. Calculate indices = np.argsort(np.sum(A, axis=0))
  2. Reorder the matrix by A = A[indices, :][:, indices]

Well, it is certainly one way to do it, but if you have within trace mode exploring than it would be quite difficult.

Would the identifiability problem mess up the tuning of the mass matrix and NUTS?
What is “within trace mode exploring”?

I would say no as the mass matrix estimation is over the whole posterior, so label switching wont effect that

I meant mode switching within chain

Would the permutation bijector avoid the mode switching problem?

I think mode switching within chain would affect estimation of the distribution.
But I only want to find the best parameters for the model and don’t need an estimation of the variance now. So I guess the post-processing would work for me.