What makes difference find_MAP() and pm.sample()

Hi, I’m doing probabilistic regression, in this case, using Gaussian Process. After i followed the Marginal Likelihood using Gaussian Process tutorial Marginal Likelihood Implementation, i am questioning about to do inference.

here is some example what i did in my code,

with pm.Model() as model:
ℓ = pm.Gamma(“ℓ”, mu=50, sigma=50)
η = pm.Gamma(“η”, mu=50, sigma=50)
cov = η ** 2 * pm.gp.cov.Matern52(N, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
σ = pm.Gamma(“σ”, mu=274, sigma=100) #
y_ = gp.marginal_likelihood(“y”, X=X, y=y, noise=σ)

mp = pm.find_MAP()

in this case, i try to inference the model, and what i followed is there are many method available in pymc3. But when i change the line pm.find_MAP() to pm.sample(), why it take much much longer? even though i change the init parameter to MAP…

note : i have about 1200 length of data (X and Y) with X is 6 column/features

Hi @mozzy ,
pm.find_MAP runs optimization which gives you exactly one parameter set (the maximum-a-posteriori estimate). If you’re lucky that’s the mode of the posterior distribution, but more often than not - and particularly in high dimensional spaces it’s not representative.

pm.sample on the other hand runs MCMC to give you draws of the posterior distribution. This gives you uncertainty information about all the model parameters. For example, find_MAP will give you exactly one , but with pm.sample you get (samples from) an entire probabily distribution over , which is much more expensive computationally.

For GPs, even though you can visualize a density band even from a MAP estimate, this is not yet the fully Bayesian quantification of uncertainty, because it is based on point estimates for the hyperparameters of the GP (ℓ, η).

Passing pm.sample(init="map") is just starting the MCMC in the MAP estimate and generally considered a bad idea. I recommend to leave the pm.sample(init=...) parameter at its default unless you have good reasons to select a different initialization strategy for the NUTS sampler.



I see, thank you for the answer. Maybe i could different strategy to compute the hyperparameter.