Change of Variables

I have a question regarding a change of variables on the observed space.

I am setting up a pymc3 model to perform a regression on the scaled log-odds of a success rate variable and then apply the inverse transformation to the posterior to interpret the results in the original variable space. Formally, I have an observed variable \theta \in (0, 1) and I apply the transformation h = f \circ g where g(\theta) = \text{logit}(\theta) and f(\theta) = \frac{\theta - m}{s} with m = \bar{\text{logit}(\theta)} and s ^ 2 = \frac{1}{n}\sum_{i=1}^n (\text{logit}(\theta_i) - \bar{\text{logit}(\theta)})^2. I then perform the inference with the model h(\theta) \sim \text{N}(\beta X, \sigma) and obtain a posterior distribution that I apply h^{-1} directly to.

That is, my code would look something like this

success_rate = np.array([0.01, 0.02])
logodds = scipy.special.logit(success_rate)
m = logodds.mean()
s = logodds.std()
observed = (logodds - m) / s
design_matrix = np.array([[1, 0, 0], [0, 1, 0]])

with pm.Model() as model:
    # Declare regression coefficients beta....
    mu =, beta)
    sigma = pm.HalfNormal('sigma', sd=1)
    likelihood = pm.Normal('likelihood', mu=mu, sd=sigma, observed=observed)

with model:
    trace = pm.sample()
    posterior = pm.sample_posterior_predictive(trace)['likelihood']
# logit inverse is expit.
predictions = scipy.special.expit(s * posterior + m) 

I was following along with the Stan article here: that mentions that if a change of variable occurs, a Jacobian adjustment needs to take place in order to preserve the probability mass under the change.

My question is, how can I correct my above procedure to take the Jacobian adjustment into account?

The Jacobian is only used when computing probability densities. Since you’re only mapping points, there is no need to take a Jacobian into account.

Because sample_posterior_predictive generates a set of (hopefully!) i.i.d. samples \theta_1, \dots, \theta_k from the marginal posterior \theta_\mathrm{post}, the transformed sample (g^{-1}\circ f^{-1})(\theta_1), \dots, (g^{-1}\circ f^{-1})(\theta_k) is an i.i.d. sample from the transformed posterior.

Now if you wanted to compute the PDF of \theta_\mathrm{post} on the observed scale, you need to know how the measure (volume) changes, and so you need the Jacobian. Letting \varphi = w(\theta_\mathrm{post}) with w=h^{-1}. Only the density of \varphi involves the Jacobian J_w of w

f_\varphi(\varphi) = \frac{f_{\theta_\mathrm{post}}(h(\varphi))}{|J_w[h(\varphi)]|}

To be clear, \theta_{\mathrm{post}} refers to the distribution on the transformed scale, while \varphi refers to the distribution on the observed scale. And, depending on your choice of priors, f_{\theta_\mathrm{post}} may not even have a closed-form.

1 Like

So it sounds like applying h^{-1} to all points from the iid posterior is correct if I’m trying to interpret the inference on the observed scale.

Meaning for record i, h^{-1}(\theta_i) is the posterior distribution of predicted outcomes in the observed space, correct?

h^{-1}(\theta_i) is a sample from the posterior distribution of predicted outcomes, in the observed space.

1 Like

Sorry, I meant that \theta_i is a vector of draws from the posterior for record i, i.e. it has all of the m draws from the posterior.

Edit: Ahh, re-reading I understand your argument.