It looks like no one else is going to answer, so I’ll do the best I can to help! There are a couple of comments I have looking at your code. In short, I don’t think you can split the inference and inverse design without making some approximations. If you can get it to work, using your existing model will probably be easiest, and I think there could be ways to speed it up. If that doesn’t work for you, I have an idea how you could splits the model up.
Speed up Existing Model
- You’re using a for-loop in your likelihood. You could probably get some speed up by replacing that for-loop with vectorized expression. You might want to use PyMC’s excellent support for named dimensions to help with this. See this example: Distribution Dimensionality — PyMC 5.18.2 documentation for more info. I made an example to show what I mean
coords = {'targets': np.arange(n_targets), 'features': np.arange(n_features), 'obs': np.arange(n), 'unseen': np.arange(n_unseen)}
with pm.Model(coords=coords) as m:
# Priors for intercepts
b = pm.Normal("b", 0, 10, dims='targets')
# Priors for coefficient matrix
a = pm.Normal("a", 0, 10, dims=('targets', 'features'))
# Priors for measurement noise standard deviations
eps = pm.Exponential("eps", 1, dims='targets')
# assume that the last n_unseen rows are the ones we want to solve the inverse problem for
x_seen = X[:-n_unseen]
# prior for the unseen features
X_unseen = pm.Uniform("X_unseen", lower=0, upper=1, dims=('unseen', 'features'))
# compute the expected values for the seen and unseen
x = pm.math.concatenate([x_seen, X_unseen])
mu = pm.math.dot(x, a.T) + b
# Likelihoods for observed data
y_obs = pm.Normal("y_obs", mu=mu, sigma=eps, observed=Y, dims=('obs', 'targets'))
# Sampling
trace = pm.sample(1000, chains=4, tune=1000, return_inferencedata=True)
- You seem to be using the default PyMC sampler. You could likely get a nice speed boost with one of the other samplers: Faster Sampling with JAX and Numba — PyMC example gallery.
- You could also try ADVI which will be a lot faster than NUTS sampling at the cost of some accuracy: Introduction to Variational Inference with PyMC — PyMC example gallery.
Separate Training and Inverse Design
If the above aren’t good enough. I think the best way to split up the models is to approximate the posterior distribution and use those in the 2nd model. To approximate the distributions, you could try PreliZ which has support for this kind of thing: Direct elicitation in 1D — PreliZ 0.12.0 documentation.
Once you have the approximated distributions you want to set up the inverse problem. From the case above, you would have
I think you could then do something like this:
with pm.Model() as inv_m:
inv_a = # some distribution you fitted
b = # some other distribution you fitted
Y = # your observed data
sigma = # some model error
X = pm.Normal('inverse design', mu=inv_a * (Y - b), sigma=sigma)
inverse_design = pm.sample_prior_predictive()
I think that you could actually get away with just doing prior sampling because you’ve inferred all the parameters you would want.
Hope this helps, I’d be happy to answer any questions!