Unique solution for probabilistic PCA

Crazy but true, allowing for covariances between the factors does alter the point-estimates for the means. I haven’t the foggiest why this should be the case.

Average Loss = 3,600.2: 100%|██████████| 125000/125000 [04:07<00:00, 504.57it/s]
Finished [100%]: Average Loss = 3,600.2
Average Loss = 3,600.1: 100%|██████████| 125000/125000 [03:03<00:00, 680.25it/s]
Finished [100%]: Average Loss = 3,600.1
Average Loss = 3,601.1: 100%|██████████| 125000/125000 [03:06<00:00, 671.65it/s]
Finished [100%]: Average Loss = 3,601.2
Average Loss = 3,580.2: 100%|██████████| 25000/25000 [07:23<00:00, 56.43it/s]
Finished [100%]: Average Loss = 3,580.2
Average Loss = 3,580.2: 100%|██████████| 25000/25000 [07:08<00:00, 58.37it/s]
Finished [100%]: Average Loss = 3,580.2
Average Loss = 3,580.5:  30%|██▉       | 7422/25000 [02:17<06:17, 46.57it/s]
Interrupted at 7,425 [29%]: Average Loss = 3,583.1
Average Loss = 3,937.5: 100%|██████████| 100000/100000 [38:53<00:00, 42.86it/s]
Finished [100%]: Average Loss = 3,937.5
Average Loss = 3,930.3: 100%|██████████| 100000/100000 [34:40<00:00, 48.06it/s]
Finished [100%]: Average Loss = 3,930.2
Average Loss = 3,583.3: 100%|██████████| 25000/25000 [14:05<00:00, 29.56it/s]
Finished [100%]: Average Loss = 3,583.3
Average Loss = 3,583.9: 100%|██████████| 25000/25000 [14:14<00:00, 29.24it/s]
Finished [100%]: Average Loss = 3,583.9
Only 250 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [err_d, W_od, del, F]
100%|██████████| 3250/3250 [01:44<00:00, 31.11it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks
Only 250 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [err_d, W_od, del, F]
100%|██████████| 3250/3250 [02:43<00:00, 19.82it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks
Only 250 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [err_d, W_od, del, F]
100%|██████████| 3250/3250 [02:18<00:00, 23.39it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks
Only 250 samples in chain.
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (1 chains in 1 job)
NUTS: [err_d, W_od, del, F]
100%|██████████| 3250/3250 [02:12<00:00, 24.50it/s]
Only one chain was sampled, this makes it impossible to run some convergence checks

ADVI:

image

FR-ADVI:

image

NUTS:

image

Are you interested in writing it up into a doc?

What would that entail?