Find_hessian version differences

opherdonchin · November 2, 2022, 10:41pm

The following code seems to produce different answers under PyMC and PyMC3.

data = np.repeat([0, 1], (10, 3))
with pm.Model() as normal_aproximation:
    p = pm.Beta('p', 1., 1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()
    std_q = ((1/pm.find_hessian(mean_q, vars=[p]))**0.5)[0]
mean_q['p'], std_q

Under PyMC3, I get:

(array(0.23076923), array([0.11685454]))

Which seems like the right answer.

Under PyMC, I get:

(array(0.23076923), array([0.61282588]))

I don’t see why this should be. Is there some API difference that I didn’t notice?

Opher

aseyboldt · November 2, 2022, 11:26pm

I’m not 100% sure yet, but it seems like it used to compute the hessian (or actually negative of the hessian, as I just noticed…) on the untransformed space, but now it computes it on the transformed space. There is also the question if the jacobian terms should be included here, they are not included in the find_MAP call, but are included in the find_hessian call…

It does seem however that the derivatives are correct on the transformed space at least:

data = np.repeat([0, 1], (10, 3))
with pm.Model() as normal_aproximation:
    p = pm.Beta('p', 1., 1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()

from scipy import optimize

jac = True
logp = normal_aproximation.logp(jacobian=jac)

func = aesara.function(normal_aproximation.value_vars, logp)

d2logp = normal_aproximation.compile_d2logp(jacobian=jac)
hesse = d2logp({"p_logodds__" : mean_q["p_logodds__"]})

x = np.linspace(mean_q["p_logodds__"] - 0.5, mean_q["p_logodds__"] + 0.5)

y = np.array([func(val) for val in x])

p = mean_q["p_logodds__"]

dlogp = normal_aproximation.compile_dlogp(jacobian=jac)
dy = dlogp({"p_logodds__": p})
ddy = d2logp({"p_logodds__": p}).ravel()

plt.plot(x, y)
plt.axvline(mean_q["p_logodds__"])
plt.plot(x, func(p) + (x - p) * dy)
plt.plot(x, func(p) + (x - p) * dy - (x - p) ** 2 * ddy / 2)

ricardoV94 · November 3, 2022, 6:53am

Related to Numerical differences between 3.11.4 and 4.0.0b · Issue #5443 · pymc-devs/pymc · GitHub

opherdonchin · November 7, 2022, 7:50am

The issue that @ricardoV94 linked had a couple of lines of code that produced the expected answer:

with pm.Model() as normal_aproximation:
    p = pm.Beta('p', alpha=1., beta=1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()

    p_value = normal_aproximation.rvs_to_values[p]
    p_value.tag.transform = None
    p_value.name = p.name

    std_q = ((1/pm.find_hessian(mean_q, vars=[p]))**0.5)[0]

It’s not clear to me whether this has not permanently changed the normal_aproximation model.

Opher

ricardoV94 · November 7, 2022, 8:10am

Yes that code changes the original model. You could reset the value name and tag to their original variables.

A more elegant solution would probably be to add the option to compute the hessian in the untransformed space in find_hessian.

But I am not sure what is the goal here. Is this just a learning exercise or a really needed feature?

opherdonchin · November 7, 2022, 8:36am

I’m teaching a Bayesian course based partly on @aloctavodia’s Bayesian Analysis in Python, but I’m updating the notebooks to use PyMC instead of PyMC3.

In Chapter 8, he uses the MAP and the Hessian in his code to demonstrate the Gaussian approximation and I needed to update it for the students.

More generally, I can imagine that there is use in getting a Gaussian approximation of the untransfomed mode. Gaussian approximations are quite generally useful.

Opher

aloctavodia · November 7, 2022, 10:37am

Hi,

Bambi offers the laplace/quadratic/Gaussian approximation as one of its inference methods. The code is here

github.com

bambinos/bambi/blob/e05daf1de6e5ca3c111b9221f44d3939c1aaaf82/bambi/backend/pymc.py#L445-L486


      
          def _run_laplace(self, draws, omit_offsets, include_mean):
              """Fit a model using a Laplace approximation.
          
          
    Mainly for pedagogical use, provides reasonable results for approximately
              Gaussian posteriors. The approximation can be very poor for some models
              like hierarchical ones. Use ``mcmc``, ``nuts_numpyro``, ``nuts_blackjax``
              or ``vi`` for better approximations.
          
          
    Parameters
              ----------
              draws: int
                  The number of samples to draw from the posterior distribution.
              omit_offsets: bool
                  Omits offset terms in the ``InferenceData`` object returned when the model includes
                  group specific effects.
              include_mean: bool
                  Compute the posterior of the mean response.
          
          
    Returns
              -------

This file has been truncated. show original

opherdonchin · November 7, 2022, 11:16am

That’s good to know. I could just tell them a couple of words about bambi (not a bad idea in any case) and then use this function.

On the other hand, looking at the source, it seems like this code is calling find_hessian on the untransformed parameter. Why can’t this work just the same outside this function (just changing the name to the result of get_untransformed_name.)

Opher

ricardoV94 · November 7, 2022, 11:18am

@aloctavodia I think that snippet suffers the same “problem” (assuming you don’t want the hessian on the transformed space) as in this issue with PyMC v4.

ricardoV94 · November 7, 2022, 11:25am

Thanks for the example. I think we need to improve our API to allow to easily define expressions on the transformed and untransformed space. I don’t think the old API was that great though…

ricardoV94 · November 7, 2022, 11:28am

I think the easiest solution for now is to recreate the PyMC model, while passing transform=None for every variable, and then call find_hessian on that model with the results from find_MAP that you got in the original model.

with pm.Model() as m:
  x = pm.Normal("x")
  y = pm.HalfNormal("y")
  pm.Normal("llike", x, y, observed=[1, 2, 3])
  _map = find_MAP()

with pm.Model() as untransformed_m:
  x = pm.Normal("x", transform=None)  # Not needed, but doesn't hurt
  y = pm.HalfNormal("y", transform=None)
  pm.Normal("llike", x, y, observed=[1, 2, 3], transform=None)  # Not needed, but doesn't hurt
  find_hessian(_map)`

opherdonchin · November 7, 2022, 11:37am

Thanks @ricardoV94 ! That’s less to explain to the students.

aloctavodia · November 7, 2022, 11:42am

Just to clarify that Bambi performs the computations on the unbounded space and then returns the results on the original scale. I agree that showing the internals to students could be too much.

opherdonchin · November 7, 2022, 12:26pm

That makes sense because bambi is actually returning samples and not a mean and covariance matrix.

No wait: those samples are from a Gaussian fit to the transformed and not the untransformed space. Right?

Opher

ricardoV94 · November 7, 2022, 4:39pm

From the code, yes, they are from a gaussian fit to the transformed (unconstrained space). But when they later call idata = _posterior_samples_to_idata(samples, self.model), the sampled values are transformed back to the constrained space.

Topic		Replies	Views
Discrepency Between Hessian Estimation of Std. Dev. and Sampled Estimation v5	6	53	April 14, 2025
Using find_MAP and hessian to find frequentist standard errors, t-stats and p-values Questions	2	1458	March 31, 2020
A little change to the sample code of MAP, however Questions	10	1117	April 1, 2018
Difference between 'jitter+adapt_diag' and 'adapt_diag'? Questions	2	1226	April 2, 2019
New To PyMC3 \| Logistic Regression - Bug Questions bug	4	788	November 4, 2020

Find_hessian version differences

Related topics