What makes difference find_MAP() and pm.sample()

mozzy · October 15, 2021, 1:58am

Hi, I’m doing probabilistic regression, in this case, using Gaussian Process. After i followed the Marginal Likelihood using Gaussian Process tutorial Marginal Likelihood Implementation, i am questioning about to do inference.

here is some example what i did in my code,

with pm.Model() as model:
ℓ = pm.Gamma(“ℓ”, mu=50, sigma=50)
η = pm.Gamma(“η”, mu=50, sigma=50)
cov = η ** 2 * pm.gp.cov.Matern52(N, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
σ = pm.Gamma(“σ”, mu=274, sigma=100) #
y_ = gp.marginal_likelihood(“y”, X=X, y=y, noise=σ)

mp = pm.find_MAP()

in this case, i try to inference the model, and what i followed is there are many method available in pymc3. But when i change the line pm.find_MAP() to pm.sample(), why it take much much longer? even though i change the init parameter to MAP…

note : i have about 1200 length of data (X and Y) with X is 6 column/features

michaelosthege · October 15, 2021, 6:14pm

Hi @mozzy ,
pm.find_MAP runs optimization which gives you exactly one parameter set (the maximum-a-posteriori estimate). If you’re lucky that’s the mode of the posterior distribution, but more often than not - and particularly in high dimensional spaces it’s not representative.

pm.sample on the other hand runs MCMC to give you draws of the posterior distribution. This gives you uncertainty information about all the model parameters. For example, find_MAP will give you exactly one ℓ, but with pm.sample you get (samples from) an entire probabily distribution over ℓ, which is much more expensive computationally.

For GPs, even though you can visualize a density band even from a MAP estimate, this is not yet the fully Bayesian quantification of uncertainty, because it is based on point estimates for the hyperparameters of the GP (ℓ, η).

Passing pm.sample(init="map") is just starting the MCMC in the MAP estimate and generally considered a bad idea. I recommend to leave the pm.sample(init=...) parameter at its default unless you have good reasons to select a different initialization strategy for the NUTS sampler.

cheers

mozzy · October 17, 2021, 3:42am

I see, thank you for the answer. Maybe i could different strategy to compute the hyperparameter.

Thanks!

Topic		Replies	Views
Kernel hyper-parameters priors and their use in MAP estimates of Gaussian Processes Questions	6	1264	May 18, 2022
Should find_MAP() be used for gaussian process models? Questions	1	599	May 22, 2020
Gaussian model methods version agnostic gaussian_process	0	376	April 6, 2022
Parameter sampling v3 gaussian_process , prior	3	493	March 28, 2022
Using Gaussian Process model to make inference? v5 gaussian_process	1	660	February 5, 2023

What makes difference find_MAP() and pm.sample()

Related topics