Discrepency Between Hessian Estimation of Std. Dev. and Sampled Estimation

sstansell · March 20, 2025, 3:08pm

I am working through McElreath’s Rethinking and have hit a pretty early wall with some basic PyMC. Specifically I want to use the quadratic approximation on a Bernoulli trial experiment. When I calculate the MAP I get a reasonable estimate for p, but when I try to estimate standard deviation via inverting the hessian I get a wild result. I compare this to two other methods (sampling and computing std from the samples, as well as the analytic result). For the life of me I cannot find out where the mistake is in my MAP implementation that is causing such a discrepancy in estimates.

data = np.array([0, 0, 0, 1, 1, 1, 1, 1, 1])
with pm.Model() as normal_approximation:
    #model
    p = pm.Uniform('p', 0, 1)
    w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())

    #quadratic approximation
    mean_q = pm.find_MAP()
    H = pm.find_hessian(mean_q, vars=[p])
    cov = np.linalg.inv(H)
    std = np.sqrt(np.diag(cov))[0]

    #sampling method to find std. dev.
    trace = pm.sample(1000)
    sampling_std = np.std(trace.posterior.p.values)

    #Analytic solution
    p_hat = 6 / 9
    analytic_std = np.sqrt(p_hat * (1 - p_hat) / 9)

After executing this, I see std = 0.63..., sampling_std = 0.13..., analytic_std = .15...

It’s clear to me that the hessian calculation is somehow going wrong - and dramatically so. But the model is correctly specified else I wouldn’t get a reasonable estimate of the std from the sampling method. Any help would be appreciated. I’m on an M1 mac running pymc version 5.21.1

jessegrabowski · March 21, 2025, 8:44am

This appears to be related to the change of variables PyMC uses internally when sampling. You should get the right answer from this model:

with pm.Model() as normal_approximation:
    p = pm.Uniform('p', 0, 1, default_transform=None)
    w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())

Though sampling might be harder, I didn’t try it.

I also got the right answer when I used pymc_extras.fit_laplace, which does this quadratic approximation for you. I looked at the code in extras to see if I could quickly discern what was being done differently, but I didn’t see anything. So there’s definitely a missing piece of the puzzle!

sstansell · March 21, 2025, 3:55pm

This fixed the immediate problem, but weirdness remains. Adding the default_transform did result in an appropriate std estimate from the Hessian method. But now the sampler is having divergence issues! Removing the default_transform and re-running fixed both the divergences and the Hessian implementation. No idea how, but toggling default_transform seems to have affected some internal state. I now get the correct result for all three methods with the original code.

For now, will mark this as the solution.

sstansell · March 23, 2025, 1:41am

While playing around with this more I found an issue - When adding the default_transform = True param I was receiving the correct std, I was getting back the wrong estimate for p:

with pm.Model() as model:
    p = pm.Uniform('p', 0, 1, default_transform=None)
    w = pm.Binomial('w', n=len(data), p=p, observed=data.sum())
    mle = pm.find_MAP()
    H = pm.find_hessian(mle)
    std = np.sqrt(np.diag(np.linalg.inv(H)))[0]

>>> mle == 0.5
>>> std == 0.16...

Clearly mle != 0.5, as it should be .6...
It appears to be an issue with the uniform prior. If I re-parameterize this as a beta-binom with priors alpha=1, beta=1 I get the exact same flip-flopping behavior where toggling default_transform toggles whether the MLE estimate or the std estimate is correct.

However, if I don’t chose a uniform prior I do get the right answer with the beta-binom model:

with pm.Model() as model:
    p = pm.Beta('p', alpha=.6, beta=.3, default_transform=None )
    y = pm.Binomial('y', n = len(data), p=p, observed=data.sum())
    mle = pm.find_MAP()
    H = pm.find_hessian(mle, vars=[p])
    std = np.sqrt(np.diag(np.linalg.inv(H)))

>>> mle = 0.6...
>>> std = .173

Very curious.

ricardoV94 · March 24, 2025, 11:24pm

Check the code example in pymc.model.transform.conditioning.remove_value_transforms — PyMC 5.21.1 documentation

Also True is not a valid value for default_transform

sstansell · April 14, 2025, 1:54pm

It’s taken me a while to get back to this. Your suggestion worked perfectly. Finding the hessian after removing value transforms yields the expected results. Any insight or reading you can refer me to on why that is?

ricardoV94 · April 14, 2025, 2:08pm

The hessian can be computed on unconstrained or constrained space and those are generally different

Topic		Replies	Views
Find_hessian version differences v5 bug	14	694	November 7, 2022
Problems with finding normal approximation to binomial v5	3	473	December 22, 2022
Any appetite for a quadratic approximation PR? Development	4	1233	July 8, 2021
Best practices for MAP estimation Questions	3	1221	September 6, 2019
Using find_MAP and hessian to find frequentist standard errors, t-stats and p-values Questions	2	1455	March 31, 2020

Discrepency Between Hessian Estimation of Std. Dev. and Sampled Estimation

Related topics