Factor analysis with wrong number of latent variables

Hello,

I’m trying to an analysis similar to one in the example notebook factor_analysis (you can find it here). The example does a good job of showing how to fit a model with k latent variables (2 in the example) if your data was generated from a latent vaiable model using the same k.

However, what we want from a factor analysis is usually more than that. We would like it to converge also for values of k that are not the correct value and then to prefer the model with the correct number of random variables.

To explore this, I changed the number of latent variables from 2 to 3 between the simulation and the fit in the notebook. Thus, we are trying to fit a variable with 3 latent variables to data that only had 2 latent variables. This does not converge (see below). Does anyone have ideas on how best to fix it?

I’m only including the changed code and the output of the effort to fit the model.

Changed code:

k = 3
coords = {"latent_columns": np.arange(k), "rows": np.arange(n), "observed_columns": np.arange(d)}

with pm.Model(coords=coords) as PPCA_identified:
    W = makeW(d, k, ("observed_columns", "latent_columns"))
    F = pm.Normal("F", dims=("latent_columns", "rows"))
    psi = pm.HalfNormal("psi", 1.0)
    X = pm.Normal("X", mu=at.dot(W, F), sigma=psi, observed=Y, dims=("observed_columns", "rows"))
    trace = pm.sample(tune=2000, cores=1, chains=4)  # target_accept=0.9

for i in trace.posterior.chain:
    samples = trace.posterior["W"].sel(chain=i, observed_columns=3, latent_columns=1)
    plt.plot(samples, label="Chain {}".format(i + 1))

plt.legend(ncol=4, loc="lower center", fontsize=8), plt.xlabel("Sample");
Output. Click to open
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (4 chains in 1 job)
NUTS: [W_z, W_b, F, psi]

 100.00% [3000/3000 01:36<00:00 Sampling chain 0, 103 divergences]

 100.00% [3000/3000 01:39<00:00 Sampling chain 1, 6 divergences]

 100.00% [3000/3000 01:38<00:00 Sampling chain 2, 91 divergences]

 100.00% [3000/3000 01:56<00:00 Sampling chain 3, 91 divergences]
Sampling 4 chains for 2_000 tune and 1_000 draw iterations (8_000 + 4_000 draws total) took 411 seconds.
There were 103 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.689, but should be close to 0.8. Try to increase the number of tuning steps.
There were 109 divergences after tuning. Increase `target_accept` or reparameterize.
There were 200 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6792, but should be close to 0.8. Try to increase the number of tuning steps.
There were 291 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6768, but should be close to 0.8. Try to increase the number of tuning steps.
c:\miniconda3\envs\pymc-examples\lib\site-packages\IPython\core\events.py:89: UserWarning: constrained_layout not applied because axes sizes collapsed to zero.  Try making figure larger or axes decorations smaller.
  func(*args, **kwargs)
c:\miniconda3\envs\pymc-examples\lib\site-packages\IPython\core\pylabtools.py:151: UserWarning: constrained_layout not applied because axes sizes collapsed to zero.  Try making figure larger or axes decorations smaller.
  fig.canvas.print_figure(bytes_io, **kw)

![output|690x332](upload://v9WPjqiAt7Abuyv4HFlJcDIg6pw.png)

I coldn’t figure out how to upload the image but it shows the trace of the chains not converging. :neutral_face:

I’m sorry. My mistake. Increasing the number of tuning steps and the target acceptance rate like the error message is telling me to solves the problem.

Just a comment, no solution: I think the model comparison you suggest is smart and this is a useful request.
I must disclaim that I never got into factor analysis, the confusing frequentist terminology is usually prohibitive when I try. And I also did not work on the example you link.

However, what I find intuitive, is that if you have two latent variables covering three factors, it might be that the two jump between the local optima and that is the reason for lack of convergence.

Just some thoughts:

  • have you tried increasing to four latvars, to see if that works?
  • is it possible to increase or decrease noise, i.e. reduce/increase sharpness of the simulated data, so that the local optima are less or more distinct? (thinking in terms of Normals here, but you get the point)

A few things:

  1. I posted too soon (sorry!) because when I increased the tune and target_accept parameters, the sampling did converge wthout divergences.
  2. The problem of model latent variables being unequal to real latent variables is an issue both when they are too large and when they are too small. I believe this is an active area of research, but I’m happy to have input.
    • When your model has too few latent variables, then you are creating a multimodality in the posterior with modes at coverage of any subset of the true latent variables and more more modes at average values between latent variables.
    • When the model has too many latent variables, then you get a different kind of multimodality where multiple model latent variables get collapsed onto one true latent variable.
  3. Since all of these issues involve multimodality in the posterior, I wonder if there has been working regarding covering different modes of the posterior with different chains.

Thanks in any case!
Opher