Jacobian adjustment for a transformed 1D FreeRV: am I doing it right?

jonsedar · October 12, 2023, 1:14pm

I’m building a copula model that includes a transformation of the observed. I have a good architecture that I’m mostly happy with except for the Jacobian adjustment - and I keep coming up against conceptual and implementation roadblocks. This is a little related to Jacobian adjustment and I’ve implemented everything so far in pymc v5.6.1.

I’ve several questions and will probably post many here But will start simple…

I’ve ported the binomial example from Bob Carpenter and Junpeng Lao

For a 1D FreeRV which is transformed (i.e. forget about my longer term issue on transforming the observed), I believe I can apply a Jacobian transform directly, see my Notebook Gist here: 991_mre_jacobiandet_issue · GitHub

Several questions here already(!)

Is this a faithful port of Bob Carpenter’s and @junpenglao’s work? (I think it is)
In Models A, is there any transformation going on under the hood inside pymc5 that I’m not aware of? For example, this jacobian=True kwarg in compile_logp makes me quite suspicious that there’s a transformation already going on (!) looks like @ricardoV94 was last to touch this bit of code, so perhaps you can put me out of my misery? https://github.com/pymc-devs/pymc/blob/602234b98d7c2c1e4382a56c16e7f71c1241bc8a/pymc/model/core.py#L613
Are the effects that I’m seeing reasonable? I’m actually surprised to see Models B with the explicit Jacobian adjustment create a posterior theta that’s further from the empirical mean…

ricardoV94 · October 12, 2023, 2:06pm

Why would you expect the first model not to recover the true value?

jonsedar · October 12, 2023, 2:15pm

Fair point, I’ll reword things! My questions still stand though

I just updated the new gist here: 991_mre_jacobiandet_issue · GitHub

jessegrabowski · October 12, 2023, 2:26pm

The jacobian=True in compile_logp refers to the transformations done to RVs that don’t have support on \mathbb R. For example, log transformation of half-normal to change it’s support from \mathbb R^+ to \mathbb R. See here, for example. Model A will have a transformation, a log-odds transformation of theta. You can check these in the mdla1.rvs_to_transforms dictionary. You can manually disable the transform by passing transform=None to pm.Beta. (But I’m not sure why you would need to?)

ricardoV94 · October 12, 2023, 2:39pm

I am not sure what you are thinking about in terms of jacobian adjustment. In all examples you set, you have some more or less specified prior, and a likelihood that eventually overwhelms the prior. It doesn’t really matter how you defined that prior, as long as it allows for values matching a 0.3 mean.

Where do you need jacobian? When you are sampling a variable according to a distribution by proposing values on a given space and transforming them to another one. For example if you want to sample a unit uniform variable by proposing values on -inf to +inf range, you need two things:

a transform that brings your unconstrained proposal to the unit range (like a log-odds transform) and
an adjustment for the fact that uniform draws on the real line do not correspond to uniform draws after the log-odds.

For PyMC, jacobian terms are only relevant for priors (and added automatically), but you seem to be thinking of them in relation to posteriors?

One example where you would need it, is if you wanted to sample in that uniform space, but had specified your prior as a Flat:

import arviz as az
import pymc as pm

with pm.Model() as m:
  # A very laborious way of writing x = pm.Uniform("x")
  x_raw = pm.Flat("x_raw")
  x = pm.Deterministic("x", pm.math.invlogit(x_raw))
  adj = pm.Potential("adj", pm.math.log(x) + pm.math.log(1 - x))

  idata = pm.sample()
az.plot_posterior(idata, var_names=["x"])

But note that the jacobian adjustment becomes irrelevant once the posterior distribution is mostly determined by the likelihood (there was none in the model I did). All that happens without it, is that I specified a slightly different prior than what I had in mind.

jonsedar · October 12, 2023, 2:54pm

Thanks guys, I’ll go one by one:

@jessegrabowski Yeah, I suspected that compile_logp was being clever under the hood and applying adjustments… This ModelA1 is purely for later comparison to see if I can make the Jacobian adjustment happen in Models B1 and B2.

@ricardoV94 My intention here is to get a baseline knowledge before tackling my problems yet to come

A 2D Jacobian adjustment for freeRVS, which I can’t make work
A 2D Jacobian adjustment of transformed data observed data

… All quite tricky to get my head around

Just to rephrase then, is the adjustment really just “fiddling around the edges” when there’s a large amount of data?

I’ll rerun now with a smaller dataset and see what happens (and also set ModelA1 to transform=None)

ricardoV94 · October 12, 2023, 3:02pm

Just to rephrase then, is the adjustment really just “fiddling around the edges” when there’s a large amount of data?

Yes, like any prior choice. As long as it has nonzero mass over the true value, it will converge to a normal centered around it with infinite data (and a sufficiently “nice” model).

My hunch is that you’re thinking of jacobian in a setting where they matter (copulas? weakly informative likelihood?), but trying to understand them in one where they don’t (highly informative likelihood over a “mispecified” prior)?

jonsedar · October 12, 2023, 3:06pm

Gotcha, that makes sense. And yes that’s exactly what I’m trying to do: understand the concepts first with something easy and then progress to the copula issue

I’ve created a new gist here: 991_mre_jacobiandet_issue · GitHub

This has:

Much less data
I’ve specified transform=None for ModelA1 so now this and ModelA2 both misbehave - which is great

And now I see that ModelsB have much tighter variance around the empirical true value, so I think this is good, right?

ricardoV94 · October 12, 2023, 3:21pm

The model is exactly the same with transform=None or left as default, just NUTS can’t really sample well without the transform. A Slice sampler would give you the same results.

Checking the Stan article you shared, note that what they did there is something you can’t really do in PyMC (at least not without hacking under the hood).

parameters { 
  real alpha; 
} 
transformed parameters { 
  real<lower=0, upper=1> theta = inv_logit(alpha); 
} 
model { 
  theta ~ uniform(0, 1); 
  y ~ bernoulli(alpha); 
}

I don’t understand the Stan syntax completely (I would expect theta in the bernoulli, not alpha), but I think it would correspond to something like:

with pm.Model() as m:
  alpha = pm.Flat("alpha")
  theta = pm.Deterministic("theta", pm.math.invlogit(alpha))
  pm.Potential("theta_prior", pm.logp(pm.Uniform.dist(), theta))
  pm.Bernoulli("y", p=theta, observed=y)

Which is trying to give theta a Uniform prior but not quite doing it because the jacobian adjustment is missing (as in my example above).

The jacobian thing that is discussed many times in Stan circles is a non-concern for PyMC users because they never write models like the ones above (i.e., defining parameters as flats, transforming them, and then adding Potentials to the transformations as an attempt to define their priors)

jonsedar · October 12, 2023, 4:58pm

Great, thank you! (both), that’s helpful to let me progress to the next step of my puzzlement (post pending)

jonsedar · October 12, 2023, 5:21pm

Just for good measure, I’ve included a new ModelB1 which uses pymc’s automatic transformation.

gist.github.com

https://gist.github.com/jonsedar/41ba398fab145b50bb6504fcf36b602e

991_mre_jacobiandet_issue.ipynb

{
  "cells": [
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 991_mre_jacobiandet_issue\n",
        "\n",
        "I want to demonstrate the effect of including/excluding a Jacobian adjustment \n",

This file has been truncated. show original

Curiously the posteriors look higher for the Normal (Model B2, Model B3) than for the Beta (Model B1), even when they all use a Jacobian adjustment. Do you reckon I can put this down to the different priors in the small data environment? When I try this with larger data (n=1000), the differences start to disappear, though still higher

ricardoV94 · October 12, 2023, 8:55pm

The distinction between pm.Beta(...) and pm.Beta(..., transform=None) is irrelevant. It’s an implementation detail and you get exactly the same posterior as long as the sampler converges. This will work if you use step=pm.Slice().

The default NUTS sampler will complain without the transform and the posterior it returns is not correct. That’s obviously why we automatically transform variables under the hood.

The fact we do, and that it requires a jacobian adjustment, is an implementation detail. All to say I am not sure those comparisons are going to help answer the questions you’re struggling with.

It sounds like you’re trying to break PyMC automatic reparametrization to understand how they work for another context, which is fine, but perhaps too indirect? You can just write down a simple Metropolis Hastings and play directly with that? Perhaps something like (link coming later)

jonsedar · October 13, 2023, 4:34am

That’s a fair point - as I said at the top this is all my attempt to understand better what’s going on under the hood so I can diagnose potential reasons for the issues I’m seeing elsewhere. In particular, the adapt_diag seeming to get stuck during init for NUTS in >5.6.1

Thanks for all this though, hopefully a helpful thread for others too, since the automated transformations and Jacobian adjustments are great, but it helps a little to understand it explicitly

Topic		Replies	Views
Jacobian adjustment Questions	1	1516	August 13, 2018
When to adjust for jacobian and when to skip in pymc3 Questions	1	712	March 1, 2019
Change of Variables Questions	4	1203	May 3, 2019
Circular transformation has a zero-valued Jacobian Questions	6	938	September 12, 2018
Rat tumor: question of prior choice Questions	1	358	December 2, 2020

Jacobian adjustment for a transformed 1D FreeRV: am I doing it right?

Related topics