Find_hessian version differences

The following code seems to produce different answers under PyMC and PyMC3.

data = np.repeat([0, 1], (10, 3))
with pm.Model() as normal_aproximation:
    p = pm.Beta('p', 1., 1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()
    std_q = ((1/pm.find_hessian(mean_q, vars=[p]))**0.5)[0]
mean_q['p'], std_q

Under PyMC3, I get:

(array(0.23076923), array([0.11685454]))

Which seems like the right answer.

Under PyMC, I get:

(array(0.23076923), array([0.61282588]))

I don’t see why this should be. Is there some API difference that I didn’t notice?

Opher

I’m not 100% sure yet, but it seems like it used to compute the hessian (or actually negative of the hessian, as I just noticed…) on the untransformed space, but now it computes it on the transformed space. There is also the question if the jacobian terms should be included here, they are not included in the find_MAP call, but are included in the find_hessian call…

It does seem however that the derivatives are correct on the transformed space at least:

data = np.repeat([0, 1], (10, 3))
with pm.Model() as normal_aproximation:
    p = pm.Beta('p', 1., 1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()

from scipy import optimize

jac = True
logp = normal_aproximation.logp(jacobian=jac)

func = aesara.function(normal_aproximation.value_vars, logp)

d2logp = normal_aproximation.compile_d2logp(jacobian=jac)
hesse = d2logp({"p_logodds__" : mean_q["p_logodds__"]})

x = np.linspace(mean_q["p_logodds__"] - 0.5, mean_q["p_logodds__"] + 0.5)

y = np.array([func(val) for val in x])

p = mean_q["p_logodds__"]

dlogp = normal_aproximation.compile_dlogp(jacobian=jac)
dy = dlogp({"p_logodds__": p})
ddy = d2logp({"p_logodds__": p}).ravel()

plt.plot(x, y)
plt.axvline(mean_q["p_logodds__"])
plt.plot(x, func(p) + (x - p) * dy)
plt.plot(x, func(p) + (x - p) * dy - (x - p) ** 2 * ddy / 2)

Related to Numerical differences between 3.11.4 and 4.0.0b · Issue #5443 · pymc-devs/pymc · GitHub

The issue that @ricardoV94 linked had a couple of lines of code that produced the expected answer:

with pm.Model() as normal_aproximation:
    p = pm.Beta('p', alpha=1., beta=1.)
    w = pm.Binomial('w',n=1, p=p, observed=data)
    mean_q = pm.find_MAP()

    p_value = normal_aproximation.rvs_to_values[p]
    p_value.tag.transform = None
    p_value.name = p.name

    std_q = ((1/pm.find_hessian(mean_q, vars=[p]))**0.5)[0]

It’s not clear to me whether this has not permanently changed the normal_aproximation model.

Opher

Yes that code changes the original model. You could reset the value name and tag to their original variables.

A more elegant solution would probably be to add the option to compute the hessian in the untransformed space in find_hessian.

But I am not sure what is the goal here. Is this just a learning exercise or a really needed feature?

I’m teaching a Bayesian course based partly on @aloctavodia’s Bayesian Analysis in Python, but I’m updating the notebooks to use PyMC instead of PyMC3.

In Chapter 8, he uses the MAP and the Hessian in his code to demonstrate the Gaussian approximation and I needed to update it for the students.

More generally, I can imagine that there is use in getting a Gaussian approximation of the untransfomed mode. Gaussian approximations are quite generally useful.

Opher

1 Like

Hi,

Bambi offers the laplace/quadratic/Gaussian approximation as one of its inference methods. The code is here

That’s good to know. I could just tell them a couple of words about bambi (not a bad idea in any case) and then use this function.

On the other hand, looking at the source, it seems like this code is calling find_hessian on the untransformed parameter. Why can’t this work just the same outside this function (just changing the name to the result of get_untransformed_name.)

Opher

@aloctavodia I think that snippet suffers the same “problem” (assuming you don’t want the hessian on the transformed space) as in this issue with PyMC v4.

Thanks for the example. I think we need to improve our API to allow to easily define expressions on the transformed and untransformed space. I don’t think the old API was that great though…

I think the easiest solution for now is to recreate the PyMC model, while passing transform=None for every variable, and then call find_hessian on that model with the results from find_MAP that you got in the original model.

with pm.Model() as m:
  x = pm.Normal("x")
  y = pm.HalfNormal("y")
  pm.Normal("llike", x, y, observed=[1, 2, 3])
  _map = find_MAP()
with pm.Model() as untransformed_m:
  x = pm.Normal("x", transform=None)  # Not needed, but doesn't hurt
  y = pm.HalfNormal("y", transform=None)
  pm.Normal("llike", x, y, observed=[1, 2, 3], transform=None)  # Not needed, but doesn't hurt
  find_hessian(_map)`

Thanks @ricardoV94 ! That’s less to explain to the students.

1 Like

Just to clarify that Bambi performs the computations on the unbounded space and then returns the results on the original scale. I agree that showing the internals to students could be too much.

1 Like

That makes sense because bambi is actually returning samples and not a mean and covariance matrix.

No wait: those samples are from a Gaussian fit to the transformed and not the untransformed space. Right?

Opher

From the code, yes, they are from a gaussian fit to the transformed (unconstrained space). But when they later call idata = _posterior_samples_to_idata(samples, self.model), the sampled values are transformed back to the constrained space.