find_MAP struggles with hierarchical model

Hi
I’m using pymc on a hierarchical binomial model. The logit of the probability is a Gaussian with a certain std.dev. that I am trying to recover (called ‘si’ in the code’).

While the MC samples look fine, find_MAP() does not seem to find the right values when data is limited (N=100 in the code). In particular, it underestimates its value, even when initialized with the ground truth value.

Note, for sufficient data (N=1000 in the script), things look fine.

Is there a way to make find_MAP more robust when data is limited?

import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt
import sys

plt.ion()
plt.close("all")

M=100  #  number of experiments
N=100  #  binomial N per experiment
# when N=1000, MAP solution looks fine.
# but when N=100, MAP finds a much too low value for sigma.

N_ar= np.array( [N]*M )
pbin = 0.5 # binomial probability

Xp  = np.log(pbin/(1-pbin)) #logit of p
siX  = 0.2   # add variability to logit p
p_ar =  1/(1 + np.exp(-np.random.normal(Xp,siX, M)))
print('ground truth:  mean(X)= ', Xp, ', sigma(X)= ', siX)

# generate data
k_ar=np.random.binomial(N_ar, p_ar )

with pm.Model():
    mu = pm.Normal("mu", 0, sigma=10)
    si = pm.HalfNormal("si", 1) # pushes to be small
    lth = pm.Normal("lth", mu, si, shape = N_ar.size )
    yobs = pm.Binomial("yobs", n=N_ar, logit_p = lth, observed=k_ar)
    trace = pm.sample(cores=4, chains =4, target_accept=0.95)

    MAP =pm.find_MAP(start = {'si': siX, 'mu':Xp} )
    print('MAP results: mu = ', MAP['mu'], 'si = ', MAP['si'])  # si is wrong when N=100

    # MAP =pm.find_MAP(method='Powell')  #even worse
    # print('MAP results with Powell: ', MAP)

    print('MC results: ', pm.stats.summary(trace, var_names=['mu','si'] ) ) #this works fine

This is a known phenomenon, and basically the reason why MAP estimation has fallen out of favor. You can somewhat mitigate it by using a zero-avoiding prior on the sigma of the group-level effect (e.g. si = pm.Gamma('si', alpha=2, beta=0)). This is the advice of the STAN prior FAQ.

In general, if you have a hierarchical model you should be doing MCMC.

thank you for the quick reply.
(This nice community is sufficient reason to do MCMC:)

2 Likes

If you really need a faster approximation than MCMC, try ADVI or pathfinder. We just buffed the implementation of pathfinder in pymc-extras, so that’s worth a second look if you’ve tried it before. There’s also a MAP+Laplace approximation in extras that can be good too (if MAP works on your problem in the first place, of course). Example of that one is here.

1 Like