How to: calibration -> inverse prediction

Hi everyone! I’m new to PyMC and have been doing a lot of reading and tutorials recently.
For my first real model with PyMC I’m looking to implement a multivariate linear regression.

The context is as follows:
I have 30 UV-vis spectra, each spectrum has about 130 signal values, one for each wavelength. These spectra are my calibration set, both the signals and the concentration of each of the nine chemical species present are known by design. sneak peek of the data.

As a chemist my typical workflow with a Frequentist model:

1- Infer from Y (signals) and X (concentrations) the parameters of the linear model, alpha (offset) and beta (sensibility coefficients). In this case for the multivariate approach, alpha would be a row vector and beta a matrix of coefficients. Y ~ A + B*X + epsilon
2- With the parameters, infer X from Y in new samples with unknown concentrations.

I have seen examples of multivariate linear models in PyMC, but not in this particular context.

In the calibration (step 1), both X and Y are known. In the Bayesian context, Y would be modeled as normally distributed and dependent on the value of X and the parameters, with X being treated as a fixed variable with no uncertainty.
But in the inverse prediction (step 2) I need to treat X as a random variable conditional on the value of Y for the unknown samples and the values of the parameters from step 1.

I feel that I can manage step 1, but I’m pretty clueless about how to achieve step 2 from there. I need to change the data Y and switch somehow X from a fixed variable to a random one and infer its distribution for each unknown sample.

If anyone has a similar example or can guide me about the best way to solve this problem I would greatly appreciated it!

You could have x be a partially observed RV with observed = [x1, x2, x3, nan, nan] and y a completely observed RV. Then the unobserved x will be inferred during mcmc sampling

@junpenglao has a nice talk showing an approach like this here:

1 Like

Thanks for the response! I will definitely check out the video.

However making X a random variable and mixing the calibration set with the unknown set it’s problematic.

During the calibration step X can be considered not random, it’s value is know with very very high precision since those concentrations are by design. In the prediction step it’s the opposite, Xpred it’s unknown and pretty much needs to be treated as random variable.

I would need a way to establish very strict priors on the values of X for the calibration data and very uninformative ones for the unknowns.

The observed part of a partially observed variable has no uncertainty or randomness whatsoever. It’s exactly the same as if you passed x as an array of numbers. The unobserved part is the one that will actually be sampled/inferred.

You can write an equivalent model that doesn’t use automatic imputation. Maybe that’s more understandable. Here I am using something simpler but you should be able to translate to your own model

with pm.Model() as m:
  x_fixed = np.array([1, 2, 3])
  x_unknown = pm.Normal("x_unknown", shape=2)
  x = pm.math.concatenate([x_fixed[None], x_unknown[None]], axis=0)
  y = pm.Normal("y", x, 0.1, observed=np.array([1, 2, 3, 4, 5]))

Thank you! This seem like a promising approach! I will toy with this idea and update the thread if i hit another roadblock.


1 Like