At the risk of sounding stupid (so please correct me if I’m wrong): my reading of the paper is that, if you’ve implemented the data sampler for p(\mathrm{data}|\theta) correctly, the data-averaged posterior should match the prior regardless of the particular prior; and any divergences suggest either a problem with the data sampler or with the likelihood (monte carlo) sampler (or both).
Even though the paper describes that calibration plot as evidence of left-skew (under-estimation: clearly visible in the quantile-quantile plot), my first check would be the traces to see if there are autocorrelations or divergences in the sampler; and if so to tune for a bit longer and to potentially thin the trace.