Change of Variables

researcher · May 3, 2019, 8:20pm

I have a question regarding a change of variables on the observed space.

I am setting up a pymc3 model to perform a regression on the scaled log-odds of a success rate variable and then apply the inverse transformation to the posterior to interpret the results in the original variable space. Formally, I have an observed variable \theta \in (0, 1) and I apply the transformation h = f \circ g where g(\theta) = \text{logit}(\theta) and f(\theta) = \frac{\theta - m}{s} with m = \bar{\text{logit}(\theta)} and s ^ 2 = \frac{1}{n}\sum_{i=1}^n (\text{logit}(\theta_i) - \bar{\text{logit}(\theta)})^2. I then perform the inference with the model h(\theta) \sim \text{N}(\beta X, \sigma) and obtain a posterior distribution that I apply h^{-1} directly to.

That is, my code would look something like this

success_rate = np.array([0.01, 0.02])
logodds = scipy.special.logit(success_rate)
m = logodds.mean()
s = logodds.std()
observed = (logodds - m) / s
design_matrix = np.array([[1, 0, 0], [0, 1, 0]])

with pm.Model() as model:
    # Declare regression coefficients beta....
    mu = theano.sparse.dot(design_matrix, beta)
    sigma = pm.HalfNormal('sigma', sd=1)
    likelihood = pm.Normal('likelihood', mu=mu, sd=sigma, observed=observed)

with model:
    trace = pm.sample()
    posterior = pm.sample_posterior_predictive(trace)['likelihood']
 
# logit inverse is expit.
predictions = scipy.special.expit(s * posterior + m)

I was following along with the Stan article here: https://mc-stan.org/users/documentation/case-studies/mle-params.html that mentions that if a change of variable occurs, a Jacobian adjustment needs to take place in order to preserve the probability mass under the change.

My question is, how can I correct my above procedure to take the Jacobian adjustment into account?

chartl · May 3, 2019, 9:15pm

The Jacobian is only used when computing probability densities. Since you’re only mapping points, there is no need to take a Jacobian into account.

Because sample_posterior_predictive generates a set of (hopefully!) i.i.d. samples \theta_1, \dots, \theta_k from the marginal posterior \theta_\mathrm{post}, the transformed sample (g^{-1}\circ f^{-1})(\theta_1), \dots, (g^{-1}\circ f^{-1})(\theta_k) is an i.i.d. sample from the transformed posterior.

Now if you wanted to compute the PDF of \theta_\mathrm{post} on the observed scale, you need to know how the measure (volume) changes, and so you need the Jacobian. Letting \varphi = w(\theta_\mathrm{post}) with w=h^{-1}. Only the density of \varphi involves the Jacobian J_w of w

f_\varphi(\varphi) = \frac{f_{\theta_\mathrm{post}}(h(\varphi))}{|J_w[h(\varphi)]|}

To be clear, \theta_{\mathrm{post}} refers to the distribution on the transformed scale, while \varphi refers to the distribution on the observed scale. And, depending on your choice of priors, f_{\theta_\mathrm{post}} may not even have a closed-form.

researcher · May 3, 2019, 9:21pm

So it sounds like applying h^{-1} to all points from the iid posterior is correct if I’m trying to interpret the inference on the observed scale.

Meaning for record i, h^{-1}(\theta_i) is the posterior distribution of predicted outcomes in the observed space, correct?

chartl · May 3, 2019, 9:24pm

h^{-1}(\theta_i) is a sample from the posterior distribution of predicted outcomes, in the observed space.

researcher · May 3, 2019, 9:38pm

Sorry, I meant that \theta_i is a vector of draws from the posterior for record i, i.e. it has all of the m draws from the posterior.

Edit: Ahh, re-reading I understand your argument.

Topic		Replies	Views
Matrix operations involving observed and latent variables v3	9	909	March 14, 2022
How to model observed percentages (bounded from 0 to 1) Questions	8	2687	January 3, 2018
Random variable as observation Questions	9	1410	January 20, 2023
How to: calibration -> inverse prediction	5	214	December 20, 2023
ADVI transofrmations Questions	1	296	February 9, 2021

Change of Variables

Related topics