I am not sure if I follow “marginal likelihood is first maximized, then the (log) prior is added and then the maximum of the of posterior found” but as far as i know, find_MAP()
minimizes the negative joint log likelihood of your model.
To be clear, say you have a model like the one given below:
with pm.Model() as model:
ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
η = pm.HalfCauchy("η", beta=5)
cov = η**2 * pm.gp.cov.Matern52(1, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
f = gp.marginal_likelihood("f", X=X, y=y)
map_estimates = pm.find_MAP()
This will find the joint log likelihood of all the continuous variables in your model and minimize the negative of that log likelihood. In this example, the joint log likelihood (\log\left(P(l, \eta, f \mid y)\right) = \log(P(l)) + \log(P(\eta)) + \log(P(f \mid y))) is the sum of log likelihoods of pm.Gamma
, pm.HalfCauchy
and pm.MvNormal
(gp’s marginal likelihood).
This log likelihood function is a very high dimensional non-convex optimization problem but methods like L-BFSG-B work our pretty well on small to medium sized data.
I’ve read however that the prior for marginal likelihood models just provides starting points for the marginal likelihood maximimization to get around its non-convex nature.
You can do this in PyMC3 by either providing testval
to the distribution or start
parameter in the find_MAP()
function. Here’s an example
with pm.Model() as model:
ℓ = pm.Gamma("ℓ", alpha=2, beta=1, testval=0.7)
η = pm.HalfCauchy("η", beta=5, testval=10.0)
cov = η**2 * pm.gp.cov.Matern52(1, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
f = gp.marginal_likelihood("f", X=X, y=y, testval=1.0)
map_estimates = pm.find_MAP()
or using start
parameter in find_MAP
function as shown below
map_estimates = pm.find_MAP(start={'ℓ': 0.7, 'η': 10.0, 'f': 1.0})