Sampling plausible posterior x parameter values in linear model for given y

You’re going to have to put some more structure into the model to able to back out p(x | y) (to answer the question in the gist, yes that’s the thing you’re looking for).

In a normal regression, you take the right-hand variables to be exogenous, so you can’t really say anything about them (by construction). To say something, we have to model x and y simultaneously. Maybe the simplest way is to just say that:

\begin{align} y &= \alpha_y + \beta_y x_t + \varepsilon_y \\ x &= \alpha_x + \beta_x y_t + \varepsilon_x \end{align}

We can write this as a single equation by defining z = \begin{bmatrix} x \\ y \end{bmatrix}, a = \begin{bmatrix} \alpha_x \\ \alpha_y \end{bmatrix} and B = \begin{bmatrix} 0 & \beta_x \\ \beta_y & 0 \end{bmatrix}, and \epsilon = \begin{bmatrix} \varepsilon_x \\ \varepsilon_y\end{bmatrix} then:

z = a + Bz + \epsilon

We will assume \epsilon \sim N(0, \Sigma). We do this 1) because it makes the math easier, but more importantly 2) the Multivariate Normal admits clean conditional distributions over unobserved variables in the case when you observe some outputs. That is, if you observe x, we can work out the conditional distribution for y. Details in a minute.

We need to know the mean of the distribution given the structural equations we’ve defined. Call the mean \mu = \mathbb E[z], and take expectations of both sides of the equation:

\begin{align}\\ \mathbb E[z] &= \mathbb E[a + Bz] \\ \mu &= \mathbb E[a] + \mathbb E[Bz] \\ \mu &= a + B\mathbb E[z] \\ \mu &= a + B\mu \\ \mu - B \mu &= a \\ (I - B) \mu &= a \\ \mu &= a (I - B)^{-1} \end{align}

Where I is the identity matrix. So this gives us a multivariate normal model, where we can model both x and y simultaneously. We just need to put priors on \alpha_x, \alpha_y, \beta_x, \beta_y, \Sigma. Well, sort of. Actually, the model isn’t identified – you have two equations and 4 unknowns. In general, it would be better if you had extra exogenous variables Z, so that the mean model would be \mu = (a + \Gamma Z)(I - B)^{-1} (derivation left as an exercise to the reader), then you could put some zeros in the \Gamma matrix (i.e.some of Z act on x but not y).

I was lazy, so I resolved the identification problem by setting \alpha_y = 0 and \beta_x = 0.

Anyway, here’s a PyMC model:

import pymc as pm
import pytensor.tensor as pt

coords = {'variable':['x', 'y'], 
           'variable_bis': ['x', 'y'],
           'obs_idx':np.arange(100).astype(int)}

with pm.Model(coords=coords) as m:
    alpha_x = pm.Normal('alpha_x', sigma=10)
    alpha_y = 0.0
    alpha = pm.Deterministic('alpha', pt.stack([alpha_x, alpha_y]), dims=['variable'])
    
    beta_x = 0.0
    beta_y = pm.Normal('beta_y')
    beta = pm.Deterministic('beta', pt.stack([beta_x, beta_y]), dims=['variable'])
    B = pt.stack([pt.stack([0.0, beta_x]), pt.stack([beta_y, 0.0])])
    
    mu = pm.Deterministic('mu', 
                          pt.linalg.solve(pt.eye(2) - B, alpha, check_finite=False),
                          dims=['variable'])

    sd_dist = pm.Exponential.dist(0.5)
    chol, *_ = pm.LKJCholeskyCov('chol', eta=1, n=2, sd_dist=sd_dist)
    cov = pm.Deterministic('cov', chol @ chol.T, dims=['variable', 'variable_bis'])
    
    obs = pm.MvNormal('obs', mu=mu, chol=chol, dims=['obs_idx', 'variable'])
    prior_idata = pm.sample_prior_predictive()

This should run fine and give you a posterior. But the real magic is in the Multivariate normal model, which has a closed form for partial observations. If we know \mu and \Sigma (which we do after sampling), then if you observe y, you know the conditional distribution for x. See here for discussion, or read the quite nice wiki.

Suppose you observe that x = 23, then you can get the conditional predictions for y as follows:

MISSING_VALUE = -9999

with pm.Model(coords=coords) as prediction_model:
    obs_data = pm.Data('obs_data', np.array([23.0, MISSING_VALUE]), dims=['variable'])
    
    # This seems like overkill, but it allows you to generalize the code to any number of 
    # variables, and any combination of missing/observed. It will also let us reuse
    # the model, changing what is observed and unobserved -- see below
    missing_idx = pt.nonzero(pt.eq(obs_data, MISSING_VALUE), return_matrix=False)[0]
    obs_idx = pt.nonzero(pt.neq(obs_data, MISSING_VALUE), return_matrix=False)[0]
    sorted_idx  = pt.concatenate([missing_idx, obs_idx])
    reverse_idx = pt.argsort(sorted_idx)
    
    n_missing = missing_idx.shape[0]
    n_obs = obs_idx.shape[0]
    
    data_sorted = obs_data[sorted_idx]
    
    # Dummy flat priors so we get an error if the model tries to sample them
    # (they should be sampled from the posterior we provide)
    mu = pm.Flat('mu', dims=['variable'])
    cov = pm.Flat('cov', dims=['variable', 'variable_bis'])
    
    # Do the marginalization math
    mu_sorted = mu[sorted_idx]
    cov_sorted = cov[sorted_idx, :][:, sorted_idx]
    
    mu_u = mu_sorted[:n_missing]
    mu_o = mu_sorted[n_missing:]
    
    cov_uu = cov_sorted[:n_missing, :][:, :n_missing]
    cov_uo = cov_sorted[:n_missing, :][:, n_missing:]
    cov_oo = cov_sorted[n_missing:, :][:, n_missing:]
    
    cov_oo_inv = pt.linalg.inv(cov_oo)
    beta = cov_uo @ cov_oo_inv
    
    mu_missing_hat = mu_sorted[:n_missing] + beta @ (data_sorted[n_missing:] - mu[obs_idx])
    Sigma_missing_hat = cov_uu - beta @ cov_oo @ beta.T    
    
    pm.Deterministic('mu_missing_hat', mu_missing_hat)
    pm.Deterministic('Sigma_missing_hat', Sigma_missing_hat)
    
    # Make a random variable 
    marginal_missing = pm.MvNormal('marginal_missing', mu=mu_missing_hat, cov=Sigma_missing_hat)
    
    # Sample the new random variable, but note that we ***pass the OLD idata***!!!
    idata_marginal = pm.sample_posterior_predictive(idata, 
                                                    var_names=['marginal_missing'])

Make a plot:

import arviz as az
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
az.plot_posterior(idata_marginal.posterior_predictive, var_names=['marginal_missing'], ax=ax)
ax.set_title(r'Predicted $y$ given $x = 23$')
plt.show()

image

It’s so simple! :fearful: But now you could reuse the prediction_model with pm.set_data to change which variable is not observed (set it to the MISSING_VALUE constant) or change the observed values. For example, we can quickly get the conditional distribution for y given x=1000:

with prediction_model:
    pm.set_data({'obs_data': np.array([MISSING_VALUE, 1000.0])})
    idata_2 = pm.sample_posterior_predictive(idata, var_names=['marginal_missing'])
    
fig, ax = plt.subplots()
az.plot_posterior(idata_2.posterior_predictive, var_names=['marginal_missing'], ax=ax)
ax.set_title(r'Predicted $x$ given $y = 1000$')
plt.show()

image