I am working through McElreath’s Rethinking and have hit a pretty early wall with some basic PyMC. Specifically I want to use the quadratic approximation on a Bernoulli trial experiment. When I calculate the MAP I get a reasonable estimate for p
, but when I try to estimate standard deviation via inverting the hessian I get a wild result. I compare this to two other methods (sampling and computing std
from the samples, as well as the analytic result). For the life of me I cannot find out where the mistake is in my MAP implementation that is causing such a discrepancy in estimates.
data = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1])
with pm.Model() as normal_approximation:
#model
p = pm.Uniform('p', 0, 1)
w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
#quadratic approximation
mean_q = pm.find_MAP()
H = pm.find_hessian(mean_q, vars=[p])
cov = np.linalg.inv(H)
std = np.sqrt(np.diag(cov))[0]
#sampling method to find std. dev.
trace = pm.sample(1000)
sampling_std = np.std(trace.posterior.p.values)
#Analytic solution
p_hat = 6 / 9
analytic_std = np.sqrt(p_hat * (1 - p_hat) / 9)
After executing this, I see std = 0.63...
, sampling_std = 0.13...
, analytic_std = .15...
It’s clear to me that the hessian calculation is somehow going wrong - and dramatically so. But the model is correctly specified else I wouldn’t get a reasonable estimate of the std from the sampling method. Any help would be appreciated. I’m on an M1 mac running pymc version 5.21.1
This appears to be related to the change of variables PyMC uses internally when sampling. You should get the right answer from this model:
with pm.Model() as normal_approximation:
p = pm.Uniform('p', 0, 1, default_transform=None)
w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
Though sampling might be harder, I didn’t try it.
I also got the right answer when I used pymc_extras.fit_laplace
, which does this quadratic approximation for you. I looked at the code in extras to see if I could quickly discern what was being done differently, but I didn’t see anything. So there’s definitely a missing piece of the puzzle!
This fixed the immediate problem, but weirdness remains. Adding the default_transform
did result in an appropriate std
estimate from the Hessian method. But now the sampler is having divergence issues! Removing the default_transform
and re-running fixed both the divergences and the Hessian implementation. No idea how, but toggling default_transform
seems to have affected some internal state. I now get the correct result for all three methods with the original code.
For now, will mark this as the solution.
While playing around with this more I found an issue - When adding the default_transform = True
param I was receiving the correct std, I was getting back the wrong estimate for p
:
with pm.Model() as model:
p = pm.Uniform('p', 0, 1, default_transform=None)
w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
mle = pm.find_MAP()
H = pm.find_hessian(mle)
std = np.sqrt(np.diag(np.linalg.inv(H)))[0]
>>> mle == 0.5
>>> std == 0.16...
Clearly mle != 0.5
, as it should be .6...
It appears to be an issue with the uniform prior. If I re-parameterize this as a beta-binom with priors alpha=1
, beta=1
I get the exact same flip-flopping behavior where toggling default_transform toggles whether the MLE estimate or the std estimate is correct.
However, if I don’t chose a uniform prior I do get the right answer with the beta-binom model:
with pm.Model() as model:
p = pm.Beta('p', alpha=.6, beta=.3, default_transform=None )
y = pm.Binomial('y', n = len(data), p=p, observed=data.sum())
mle = pm.find_MAP()
H = pm.find_hessian(mle, vars=[p])
std = np.sqrt(np.diag(np.linalg.inv(H)))
>>> mle = 0.6...
>>> std = .173
Very curious.
It’s taken me a while to get back to this. Your suggestion worked perfectly. Finding the hessian after removing value transforms yields the expected results. Any insight or reading you can refer me to on why that is?
The hessian can be computed on unconstrained or constrained space and those are generally different