Solving inverse problem for a Bayesian linear regression?

Hi, I have a training dataset X,Y with n rows, j features and k targets, i.e. One thing to note is that these are simulation results, where each of the variables in X are i.i.d. and pulled from uniform random distributions over each of the features.

So we have:

f(\mathbf{x}):\mathbb{R}^j \to \mathbb{R}^k

X \sim U(0,1)^{n\times j}

Y \in \mathbb{R}^{n\times k} =f(X)

We are fitting a simple linear model for now:

Y=AX+b+\epsilon

I additionally have a dataset Y' with m rows where only the targets are known, and not the features. The goal is to get distributions/posterior samples over what the corresponding X' should look like.

So far, I’ve been able to set up the problem without any issues really by just specifying a prior over the unknown features and solving the linear regression and the inverse design simultaneously:

import numpy as np
import pandas as pd
import pymc as pm

n = 4000
n_features = 5
n_targets = 3
n_unseen = 1

# Generate synthetic data
X = np.random.uniform(low=0, high=1, size=(n, n_features))
coeffs = np.random.uniform(low=-10, high=10, size=(n_features, n_targets))
intercepts = np.random.uniform(low=-10, high=10, size=n_targets)
noise_mag = np.random.uniform(low=0, high=0.3, size=n_targets)
noise = np.random.normal(0, noise_mag, size=(n, n_targets))
X_lin = np.dot(X, coeffs)
Y = X_lin + intercepts + noise


with pm.Model() as m:
    # Priors for intercepts
    b = pm.Normal("b", 0, 10, shape=(n_targets,)) 

    # Priors for coefficient matrix
    a = pm.Normal("a", 0, 10, shape=(n_targets, n_features))

    # Priors for measurement noise standard deviations
    eps = pm.Exponential("eps", 1, shape=n_targets)

    # assume that the last n_unseen rows are the ones we want to solve the inverse problem for
    x_seen = X[:-n_unseen]
    # prior for the unseen features
    X_unseen = pm.Uniform("X_unseen", lower=0, upper=1, shape=(n_unseen, n_features))

    # compute the expected values for the seen and unseen
    x = pm.math.concatenate([x_seen, X_unseen])
    mu = pm.math.dot(x, a.T) + b

    # Likelihoods for observed data
    for i in range(n_targets):
        y_obs = pm.Normal(f"y{i}_obs", mu=mu[:, i], sigma=eps[i], observed=Y[:, i])
    # y_obs = pm.Normal("y_obs", mu=mu, sigma=eps, observed=Y)

    # Sampling
    trace = pm.sample(1000, chains=2, tune=1000, return_inferencedata=True)

This is working pretty much as intended. It’s a little slow once n gets large or n_features gets large, but that seems expected. However, because I need to be able to use this model in production to solve the inverse design problem many times, being slow (i.e. > 2min) is a non-starter. For context, the real datasets I will be training these models on have about 20k-60k rows in it.

It seems reasonable to me that I should be able to split this up into two steps.

  1. Train the bayesian linear regression model (which can take arbitrarily long since it only needs to done once)
  2. When it’s time to solve an inverse design problem, use the trained bayesian linear regression model to discover the posteriors for the unknown features.

I haven’t quite been able to figure out the right way to architect that with PyMC. Should it be two separate models, where one of them somehow ingests the posterior from the first model? Should it be done with a single model using data containers, where X_seen is and Y_seen are immutable and Y_unseen is mutable? If I set it up this way, will it avoid trying to relearn the posteriors for all the coefficients and intercepts/noise magnitudes?

I couldn’t find a good clear example of setting up a re-usable model for this.

I know one trivial solution is to just fit a regular linear regression model, and then it’s easy to set up the inverse design problem in a very re-usable way, but I was having trouble figuring out the right way to split up the Bayesian linear regression training and the inverse design. Thanks in advance!

It looks like no one else is going to answer, so I’ll do the best I can to help! There are a couple of comments I have looking at your code. In short, I don’t think you can split the inference and inverse design without making some approximations. If you can get it to work, using your existing model will probably be easiest, and I think there could be ways to speed it up. If that doesn’t work for you, I have an idea how you could splits the model up.

Speed up Existing Model

  1. You’re using a for-loop in your likelihood. You could probably get some speed up by replacing that for-loop with vectorized expression. You might want to use PyMC’s excellent support for named dimensions to help with this. See this example: Distribution Dimensionality — PyMC 5.18.2 documentation for more info. I made an example to show what I mean
coords = {'targets': np.arange(n_targets), 'features': np.arange(n_features), 'obs': np.arange(n), 'unseen': np.arange(n_unseen)}


with pm.Model(coords=coords) as m:
    # Priors for intercepts
    b = pm.Normal("b", 0, 10, dims='targets')

    # Priors for coefficient matrix
    a = pm.Normal("a", 0, 10, dims=('targets', 'features'))

    # Priors for measurement noise standard deviations
    eps = pm.Exponential("eps", 1, dims='targets')

    # assume that the last n_unseen rows are the ones we want to solve the inverse problem for
    x_seen = X[:-n_unseen]
    # prior for the unseen features
    X_unseen = pm.Uniform("X_unseen", lower=0, upper=1, dims=('unseen', 'features'))

    # compute the expected values for the seen and unseen
    x = pm.math.concatenate([x_seen, X_unseen])
    mu = pm.math.dot(x, a.T) + b

    # Likelihoods for observed data
    y_obs = pm.Normal("y_obs", mu=mu, sigma=eps, observed=Y, dims=('obs', 'targets'))

    # Sampling
    trace = pm.sample(1000, chains=4, tune=1000, return_inferencedata=True)
  1. You seem to be using the default PyMC sampler. You could likely get a nice speed boost with one of the other samplers: Faster Sampling with JAX and Numba — PyMC example gallery.
  2. You could also try ADVI which will be a lot faster than NUTS sampling at the cost of some accuracy: Introduction to Variational Inference with PyMC — PyMC example gallery.

Separate Training and Inverse Design

If the above aren’t good enough. I think the best way to split up the models is to approximate the posterior distribution and use those in the 2nd model. To approximate the distributions, you could try PreliZ which has support for this kind of thing: Direct elicitation in 1D — PreliZ 0.12.0 documentation.

Once you have the approximated distributions you want to set up the inverse problem. From the case above, you would have

\textbf{X} = \textbf{A}^{-1}(\textbf{Y} - b)

I think you could then do something like this:

with pm.Model() as inv_m:

    inv_a = # some distribution you fitted
    b = # some other distribution you fitted
    Y = # your observed data 
    
    sigma = # some model error
    
    X = pm.Normal('inverse design', mu=inv_a * (Y - b), sigma=sigma)
    inverse_design = pm.sample_prior_predictive()

I think that you could actually get away with just doing prior sampling because you’ve inferred all the parameters you would want.

Hope this helps, I’d be happy to answer any questions!

3 Likes

What you’re talking about is often called “amortized inference” and it’s usually done with a neural network to give you enough power to learn the mapping from data to posterior estimates and variances.

If you stick with a simple linear model and assume an inverse gamma prior on variance (eps**2) and a normal prior on the regression coefficients, then the posterior is analytical. If you only care about posterior means, you can also work those out analytically.

assume that the last n_unseen rows are the ones we want to solve

Technically, the inverse problem only involves x_seen—that gives you a posterior over parameters a, b, eps which you can then plug-in to solve for Y_unseen directly—you don’t need to use MCMC for this as your solution will be doing. In fact, if you stick to the linear case, everything will come out analytical here.

In math, what you’re trying to do is sample from the posterior predictive distribution:

\displaystyle p(y^\text{unseen} \mid y^\text{seen}) = \int_{\mathbb{R}^3} p(y^\text{unseen} \mid a, b, \epsilon) \cdot p(a, b, \epsilon \mid y^\text{seen}) \, \text{d}(a, b, \epsilon).

You can do this by taking draws a^{m}, b^{m}, \epsilon^m \sim p(a, b, \epsilon \mid y^\text{seen}) and using the plug-in estimator

\displaystyle \hat{a}, \hat{b}, \hat{\epsilon} = \frac{1}{M} \sum_{m=1}^M p(y^\text{unseen} \mid a^m, b^m, \epsilon^m).