Evaluating log likelihood using estimates from find_MAP results

Background

Let me first start off by saying I ackownledge find_MAP is not recomended. I’m using the function to prove a point in a paper.

I want to compare models fit through MAP and full Bayesian inference for a heirarchical partially pooled non-linear regression model. I find the map by performing

with model:
    maps = pm.find_MAP(method="powell", maxeval=50000)

Upon termination, the log-likelihood is approximately 2250. However, when I evaluate the log probability with model.logp(maps), I get a different log-likelihood value.

Upon inspection, model.test_point and what is returned from find_MAP have different keys.

Here is the return from model.test_point

dict_keys(['log_CL', 'betas_CL_1', 'betas_CL_2', 'betas_CL_3', 'z_CL', 's_CL_log__', 'log_alpha_log_ke_z', 'betas_ke_1', 'betas_ke_2', 'betas_ke_3', 'z_ke', 's_ke_log__', 'betas_ka_1', 'betas_ka_2', 'betas_ka_3', 'z_ka', 's_ka_log__', 'delay_mu_logodds__', 'delay_kappa_log__', 'delays_logodds__', 'sigma_log__'])

and here is the return from find_MAP

dict_keys(['log_CL', 'betas_CL_1', 'betas_CL_2', 'betas_CL_3', 'z_CL', 's_CL_log__', 'log_alpha_log_ke_z', 'betas_ke_1', 'betas_ke_2', 'betas_ke_3', 'z_ke', 's_ke_log__', 'betas_ka_1', 'betas_ka_2', 'betas_ka_3', 'z_ka', 's_ka_log__', 'delay_mu_logodds__', 'delay_kappa_log__', 'delays_logodds__', 'sigma_log__', 's_CL', 'logit_alpha', 'log_ke', 's_ke', 'log_ka', 's_ka', 'delay_mu', 'delay_kappa', 'delays', 'y_est', 'sigma'])

It looks like find_MAP is using random variables on the bounded and unbounded scale. For instance, delay_mu has a Beta prior. In the test point, we see the logodds of that parameter, but in the find_MAP result we see both delay_mu_logodds__ and delay_mu.

Question

  • Is find_MAP attempting to optimize parmameters on the bounded and un bounded scale?
  • Is this intended functionality, and if not how can I rectify this?

This is a great question, and actually you might want to invest further giving that you are writing a paper on this (I gather some information from your tweet of what you are trying to do):

  • find_MAP optimized free parameters on the unbounded scale (in many cases it is also desired as you dont need to specify the boundary of parameters), see a related discussion https://github.com/pymc-devs/pymc3/issues/2545.

  • However, the model log_prob that pm.find_MAP “sees” is different from the one pm.sample “sees”: transforming the bounded free parameters to Real you need to account for the volume changes, however, if you optimized the log_prob you get somewhat unexpected result (there is more discussion here https://github.com/pymc-devs/pymc3/issues/2482)
    As a result, we decide that for find_MAP it will optimized the log_prob without the Jacobian correction (https://github.com/pymc-devs/pymc3/issues/2482#issuecomment-320569496) but still on the unbounded space, in this case you get the same as if you are optimizing with the bounded version.

As implication for your paper, I think the you would get the same MLE if you hand written your likelihood function, but you should be aware that the posterior HMC is sampling from is different from the likelihood you are optimizing - it might make your point stronger to argue the importance of using MC samples.

Thanks, Junpeng. I took a look at Bob Carpenter’s case study that you provided. If you don’t mind, I’d like to ask some questions about how best to translate the moral of that study into PyMC3.

In this model, I am regressing latent variables onto observed covariates. For example, I use sex, weight, and kidney function to measure a parameter called CL. Becuase CL is only physically understood when it is positive, I model it as follows CL \sim \exp(\mathbf{x}\beta + z\sigma ).

I do something like this…

    #Parameter for the intercept
    log_CL = pm.Normal("log_CL", tt.log(3.3), 0.25)

    # Coefficients for the regression
    betas_CL = pm.Normal('betas_CL', mu=0, sigma=0.05, shape=X.shape[1])

    # Use to estimate random effects per patient
    z_CL = pm.Normal("z_CL", 0, 1, shape=len(np.unique(subject_ids)))

    # Population level noise for CL
    s_CL = pm.Lognormal("s_CL", tt.log(0.2), .1)

    Cl = tt.exp(log_CL + pm.math.dot(X,betas_CL) + z_CL[np.unique(subject_ids)] * s_CL)

Because of the parmeterization, should I apply the jacobian correction?

I’m inclined to say No since I am first sampling parameters and then transforming them (as per the The Stan Manual)

You dont need to correct for Cl, as it is observed - however, I am not sure how your likelihood looks like, are you going to model it as pm.Exponential?