Well, in the paper I provided you can see that the adaptation I refer to is described as DRAM which is as AM+DR, with:
AM:
The intuition behind the AM [Adaptive Metropolis] is that, on-line tuning the proposal distribution in a MH can be based on the past sample path of the chain. Due to this form of adaptation the resulting sampler is neither Markovian nor reversible. In Haario et al. (2001) the authors prove, from first principles, that, under some regularity conditions on the way adaptation is performed and if the target distribution is bounded on a bounded support, the AM retains the desired stationary distribution.
for instance the sigma of the Gaussian proposal distribution is tweaked during the chain.
DR:
Delayed rejection (DR) is a way of modifying the standard Metropolis-Hastings algorithm (MH) (Tierney, 1994) to improve efficiency of the resulting MCMC estimators relative to Peskun (1973) and Tierney (1998) asymptotic variance ordering. The basic idea is that, upon rejection in a MH, instead of advancing time and retaining the same position, a second stage move is proposed. The acceptance probability of the second stage candidate is computed so that reversibility of the Markov chain relative to the distribution of interest, π, is preserved. The process of delaying rejection can be iterated for a fixed or random number of stages. Higher stage proposals are allowed to depend on the candidates so far proposed and rejected. Thus DR allows partial local adaptation of the proposal within each time step of the Markov chain still retaining the Markovian property and reversibility.
So what they propose with DRAM is:
The Adaptive Metropolis (AM) algorithm is the global adaptive strategy we will combine with the local adaptive strategy provided by the DR.
and in my case, the toolbox using this concept that I linked earlier works better than the current standard Metropolis implementation in PyMC. Upon close inspection of the code, I believe this is mostly due to the AM part tweaking the sigma of the distribution that gets me ‘unstuck’, whereas I would stay stuck on PyMC.
Actually I also just tried DEMetropolis on my problem and it doesn’t really give me better results either on my problem, i.e. I barely get the values of the parameter in the 2 sigma range after 100000s of samples.
I should still try an SMC to check, but if you have otehr propositions I’m all ears
.