Hi
I’m using pymc on a hierarchical binomial model. The logit of the probability is a Gaussian with a certain std.dev. that I am trying to recover (called ‘si’ in the code’).
While the MC samples look fine, find_MAP() does not seem to find the right values when data is limited (N=100 in the code). In particular, it underestimates its value, even when initialized with the ground truth value.
Note, for sufficient data (N=1000 in the script), things look fine.
Is there a way to make find_MAP more robust when data is limited?
import pymc as pm
import numpy as np
import arviz as az
import matplotlib.pyplot as plt
import sys
plt.ion()
plt.close("all")
M=100 # number of experiments
N=100 # binomial N per experiment
# when N=1000, MAP solution looks fine.
# but when N=100, MAP finds a much too low value for sigma.
N_ar= np.array( [N]*M )
pbin = 0.5 # binomial probability
Xp = np.log(pbin/(1-pbin)) #logit of p
siX = 0.2 # add variability to logit p
p_ar = 1/(1 + np.exp(-np.random.normal(Xp,siX, M)))
print('ground truth: mean(X)= ', Xp, ', sigma(X)= ', siX)
# generate data
k_ar=np.random.binomial(N_ar, p_ar )
with pm.Model():
mu = pm.Normal("mu", 0, sigma=10)
si = pm.HalfNormal("si", 1) # pushes to be small
lth = pm.Normal("lth", mu, si, shape = N_ar.size )
yobs = pm.Binomial("yobs", n=N_ar, logit_p = lth, observed=k_ar)
trace = pm.sample(cores=4, chains =4, target_accept=0.95)
MAP =pm.find_MAP(start = {'si': siX, 'mu':Xp} )
print('MAP results: mu = ', MAP['mu'], 'si = ', MAP['si']) # si is wrong when N=100
# MAP =pm.find_MAP(method='Powell') #even worse
# print('MAP results with Powell: ', MAP)
print('MC results: ', pm.stats.summary(trace, var_names=['mu','si'] ) ) #this works fine