What is the purpose of a burned trace?

I’m reading [Probabilistic Programming for Hackers] (https://nbviewer.jupyter.org/github/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/blob/master/Chapter2_MorePyMC/Ch2_MorePyMC_PyMC3.ipynb) and i’m wondering why did he used the first few data points from trace as a posterior distribution?

step = pm.Metropolis()
trace = pm.sample(20000, step=step)

Random walk MH sampler usually converge to the typical set very very slowly in high dimension, thus it becomes a good practices to discard the first few thousand samples that are not yet in the typical set.

You usually dont need it any more using modern sampler like NUTS or HMC, because they find the typical set very quickly.