Evaluating log likelihood using estimates from find_MAP results

dpananos · January 24, 2020, 3:42pm

Background

Let me first start off by saying I ackownledge find_MAP is not recomended. I’m using the function to prove a point in a paper.

I want to compare models fit through MAP and full Bayesian inference for a heirarchical partially pooled non-linear regression model. I find the map by performing

with model:
    maps = pm.find_MAP(method="powell", maxeval=50000)

Upon termination, the log-likelihood is approximately 2250. However, when I evaluate the log probability with model.logp(maps), I get a different log-likelihood value.

Upon inspection, model.test_point and what is returned from find_MAP have different keys.

Here is the return from model.test_point

dict_keys(['log_CL', 'betas_CL_1', 'betas_CL_2', 'betas_CL_3', 'z_CL', 's_CL_log__', 'log_alpha_log_ke_z', 'betas_ke_1', 'betas_ke_2', 'betas_ke_3', 'z_ke', 's_ke_log__', 'betas_ka_1', 'betas_ka_2', 'betas_ka_3', 'z_ka', 's_ka_log__', 'delay_mu_logodds__', 'delay_kappa_log__', 'delays_logodds__', 'sigma_log__'])

and here is the return from find_MAP

dict_keys(['log_CL', 'betas_CL_1', 'betas_CL_2', 'betas_CL_3', 'z_CL', 's_CL_log__', 'log_alpha_log_ke_z', 'betas_ke_1', 'betas_ke_2', 'betas_ke_3', 'z_ke', 's_ke_log__', 'betas_ka_1', 'betas_ka_2', 'betas_ka_3', 'z_ka', 's_ka_log__', 'delay_mu_logodds__', 'delay_kappa_log__', 'delays_logodds__', 'sigma_log__', 's_CL', 'logit_alpha', 'log_ke', 's_ke', 'log_ka', 's_ka', 'delay_mu', 'delay_kappa', 'delays', 'y_est', 'sigma'])

It looks like find_MAP is using random variables on the bounded and unbounded scale. For instance, delay_mu has a Beta prior. In the test point, we see the logodds of that parameter, but in the find_MAP result we see both delay_mu_logodds__ and delay_mu.

Question

Is find_MAP attempting to optimize parmameters on the bounded and un bounded scale?
Is this intended functionality, and if not how can I rectify this?

junpenglao · January 24, 2020, 4:33pm

This is a great question, and actually you might want to invest further giving that you are writing a paper on this (I gather some information from your tweet of what you are trying to do):

find_MAP optimized free parameters on the unbounded scale (in many cases it is also desired as you dont need to specify the boundary of parameters), see a related discussion https://github.com/pymc-devs/pymc3/issues/2545.
However, the model log_prob that pm.find_MAP “sees” is different from the one pm.sample “sees”: transforming the bounded free parameters to Real you need to account for the volume changes, however, if you optimized the log_prob you get somewhat unexpected result (there is more discussion here https://github.com/pymc-devs/pymc3/issues/2482)
As a result, we decide that for find_MAP it will optimized the log_prob without the Jacobian correction (https://github.com/pymc-devs/pymc3/issues/2482#issuecomment-320569496) but still on the unbounded space, in this case you get the same as if you are optimizing with the bounded version.

As implication for your paper, I think the you would get the same MLE if you hand written your likelihood function, but you should be aware that the posterior HMC is sampling from is different from the likelihood you are optimizing - it might make your point stronger to argue the importance of using MC samples.

dpananos · January 24, 2020, 7:33pm

Thanks, Junpeng. I took a look at Bob Carpenter’s case study that you provided. If you don’t mind, I’d like to ask some questions about how best to translate the moral of that study into PyMC3.

In this model, I am regressing latent variables onto observed covariates. For example, I use sex, weight, and kidney function to measure a parameter called CL. Becuase CL is only physically understood when it is positive, I model it as follows CL \sim \exp(\mathbf{x}\beta + z\sigma ).

I do something like this…

    #Parameter for the intercept
    log_CL = pm.Normal("log_CL", tt.log(3.3), 0.25)

    # Coefficients for the regression
    betas_CL = pm.Normal('betas_CL', mu=0, sigma=0.05, shape=X.shape[1])

    # Use to estimate random effects per patient
    z_CL = pm.Normal("z_CL", 0, 1, shape=len(np.unique(subject_ids)))

    # Population level noise for CL
    s_CL = pm.Lognormal("s_CL", tt.log(0.2), .1)

    Cl = tt.exp(log_CL + pm.math.dot(X,betas_CL) + z_CL[np.unique(subject_ids)] * s_CL)

Because of the parmeterization, should I apply the jacobian correction?

I’m inclined to say No since I am first sampling parameters and then transforming them (as per the The Stan Manual)

junpenglao · January 25, 2020, 9:18am

You dont need to correct for Cl, as it is observed - however, I am not sure how your likelihood looks like, are you going to model it as pm.Exponential?

Topic		Replies	Views
Unconsistent logp, Evaluating the results of find_MAP on Custom Likelihood Questions	3	581	August 28, 2018
Kernel hyper-parameters priors and their use in MAP estimates of Gaussian Processes Questions	6	1264	May 18, 2022
Using find_MAP and hessian to find frequentist standard errors, t-stats and p-values Questions	2	1458	March 31, 2020
What makes difference find_MAP() and pm.sample() Questions	2	2335	October 17, 2021
Simple exponential model fails to find map_estimate Questions	14	549	July 2, 2021

Evaluating log likelihood using estimates from find_MAP results

Background

Question

Related topics