What you’re talking about is often called “amortized inference” and it’s usually done with a neural network to give you enough power to learn the mapping from data to posterior estimates and variances.
If you stick with a simple linear model and assume an inverse gamma prior on variance (eps**2) and a normal prior on the regression coefficients, then the posterior is analytical. If you only care about posterior means, you can also work those out analytically.
assume that the last n_unseen rows are the ones we want to solve
Technically, the inverse problem only involves x_seen—that gives you a posterior over parameters a, b, eps which you can then plug-in to solve for Y_unseen directly—you don’t need to use MCMC for this as your solution will be doing. In fact, if you stick to the linear case, everything will come out analytical here.
In math, what you’re trying to do is sample from the posterior predictive distribution:
\displaystyle p(y^\text{unseen} \mid y^\text{seen}) = \int_{\mathbb{R}^3} p(y^\text{unseen} \mid a, b, \epsilon) \cdot p(a, b, \epsilon \mid y^\text{seen}) \, \text{d}(a, b, \epsilon).
You can do this by taking draws a^{m}, b^{m}, \epsilon^m \sim p(a, b, \epsilon \mid y^\text{seen}) and using the plug-in estimator
\displaystyle \hat{a}, \hat{b}, \hat{\epsilon} = \frac{1}{M} \sum_{m=1}^M p(y^\text{unseen} \mid a^m, b^m, \epsilon^m).