Sorry about that! As they say in CS, there are only two things that are hard, naming, memory locality, and off by one errors.
Following Andrew Gelman et al.'s notation from Bayesian Data Analysis, I’d be tempted to use (x, y) for observed data and (\tilde{x}, \tilde{y}) for held out data. Then you can split \tilde{x} into \tilde{x}^\text{obs} and \tilde{x}^\text{miss}.
How precise do your estimates need to be and what ESS are you getting after 2 min? It’s possible you could shave a lot of that down by using fewer iterations, and if possible, warm starting if you’re solving similar regressions multiple times.
You can use the same model structure in PyMC (this is something you cannot do in Stan, where you’d have to rewrite everything). The difference is that you’ll have one instance with some variables observed (x, y) and you’ll be inferring, for example \alpha, \beta, \sigma. Then you plug in the estimates \widehat{\alpha}, \widehat{\beta}, \widehat{\sigma} and \tilde{x}^\text{obs}, \tilde{y} as data in the second instance and then impute \tilde{x}^\text{miss}.
I’m not sure if this is easy in PyMC, but you can also treat this like a “cut”, where you get posterior draws rather than estimates, then plug them in. There’s a paper by Martyn Plummer (developer of JAGS) on this: Cuts in Bayesian graphical models. One way to do it is to run MCMC for each of the draws, but that’ll get expensive again.
The model you’re laying down for covariates (uniform in a hypercube) means that you can’t really learn any conditional information about them. Another way to do this would be to set up a multivariate normal and then transform with inverse logit. That will let you learn dependencies among the covariates. Again, no idea how easy this would be in PyMC.