Hi everyone! I’m new to PyMC and have been doing a lot of reading and tutorials recently.
For my first real model with PyMC I’m looking to implement a multivariate linear regression.
The context is as follows:
I have 30 UV-vis spectra, each spectrum has about 130 signal values, one for each wavelength. These spectra are my calibration set, both the signals and the concentration of each of the nine chemical species present are known by design. sneak peek of the data.
As a chemist my typical workflow with a Frequentist model:
1- Infer from Y (signals) and X (concentrations) the parameters of the linear model, alpha (offset) and beta (sensibility coefficients). In this case for the multivariate approach, alpha would be a row vector and beta a matrix of coefficients. Y ~ A + B*X + epsilon
2- With the parameters, infer X from Y in new samples with unknown concentrations.
I have seen examples of multivariate linear models in PyMC, but not in this particular context.
In the calibration (step 1), both X and Y are known. In the Bayesian context, Y would be modeled as normally distributed and dependent on the value of X and the parameters, with X being treated as a fixed variable with no uncertainty.
But in the inverse prediction (step 2) I need to treat X as a random variable conditional on the value of Y for the unknown samples and the values of the parameters from step 1.
I feel that I can manage step 1, but I’m pretty clueless about how to achieve step 2 from there. I need to change the data Y and switch somehow X from a fixed variable to a random one and infer its distribution for each unknown sample.
If anyone has a similar example or can guide me about the best way to solve this problem I would greatly appreciated it!