What makes difference find_MAP() and pm.sample()

Hi, I’m doing probabilistic regression, in this case, using Gaussian Process. After i followed the Marginal Likelihood using Gaussian Process tutorial Marginal Likelihood Implementation, i am questioning about to do inference.

here is some example what i did in my code,

with pm.Model() as model:
ℓ = pm.Gamma(“ℓ”, mu=50, sigma=50)
η = pm.Gamma(“η”, mu=50, sigma=50)
cov = η ** 2 * pm.gp.cov.Matern52(N, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
σ = pm.Gamma(“σ”, mu=274, sigma=100) #
y_ = gp.marginal_likelihood(“y”, X=X, y=y, noise=σ)

mp = pm.find_MAP()

in this case, i try to inference the model, and what i followed is there are many method available in pymc3. But when i change the line pm.find_MAP() to pm.sample(), why it take much much longer? even though i change the init parameter to MAP…

note : i have about 1200 length of data (X and Y) with X is 6 column/features

Hi @mozzy ,
`pm.find_MAP` runs optimization which gives you exactly one parameter set (the maximum-a-posteriori estimate). If you’re lucky that’s the mode of the posterior distribution, but more often than not - and particularly in high dimensional spaces it’s not representative.

`pm.sample` on the other hand runs MCMC to give you draws of the posterior distribution. This gives you uncertainty information about all the model parameters. For example, `find_MAP` will give you exactly one `ℓ`, but with `pm.sample` you get (samples from) an entire probabily distribution over `ℓ`, which is much more expensive computationally.

For GPs, even though you can visualize a density band even from a MAP estimate, this is not yet the fully Bayesian quantification of uncertainty, because it is based on point estimates for the hyperparameters of the GP (ℓ, η).

Passing `pm.sample(init="map")` is just starting the MCMC in the MAP estimate and generally considered a bad idea. I recommend to leave the `pm.sample(init=...)` parameter at its default unless you have good reasons to select a different initialization strategy for the `NUTS` sampler.

cheers

2 Likes

I see, thank you for the answer. Maybe i could different strategy to compute the hyperparameter.

Thanks!