I have a 100-dimension input variable xi and I used both Metropolis and DEMetropolise() to sample so that output (f(xi)) can conform to observed data. If I use Metropolis() tune 1000, sample 5000 but with 4 chains, then i can get 5000*4 = 20000 posterior xi from it. However, if I use numpy.unique to check the posterior xi, I found that only 577 out of 20000 are unique, which means there are a lot of repetitive xi. Similarly, if I use DEMetropolise(), I have 3412 out of 20000 are unique. I wonder if anyone one could provide some suggestions on how to address this problem. Thank you!
They are not unique because there are proposal got rejected - This is what happens for Metropolis as it is inefficient in high dimension, but more general for MCMC, even for NUTS usually it is tuned to around 80% acceptance rate, which would means that 20% of the time proposal is rejected and the sample is identical with the previous sample.
Thank you @junpenglao. So I probably misunderstood what trace stores for us. I thought trace stores all the accepted samples based on the metropolis hasting criteria. So if there are 5000 samples in my trace and my rejection rate is 20%, meaning that the algorithm tried 6000 samples in total. Could you correct me if I was wrong?
And ideally i want all the posterior samples in trace to be different because I want them to capture the uncertainty of the problem.
Thank you!
That is the posterior samples, you dont want to keep only the accepted sample otherwise your posterior will be wrong! See 2 in https://umbertopicchini.wordpress.com/2017/12/18/tips-for-coding-a-metropolis-hastings-sampler/