I checked the shape of trace for missing values and it matched with #missing in data.
To figure out which rows the algo was using, I did another simulation where instead of random missing, I purposely set each row having one missing value. Turned out still the algo was able to recover the cov matrix and at the same time generate dist. for missing values which was better than trivial imputation. In other words I think the algo is smart to use partially missing rows to do inference but I am not sure how it did it. Couldn’t figure out a generative process from it.
Another observation is the corr estimation was better for pair of columns where co-non-missing rate is high, which makes me wonder did the algo use pairwise marginal distribution to do inference instead of the joint?