Previously, I asked some questions about surrogate models in Pymc3. I was relatively new at the time and so I think I was trying to run before I could walk. I’ve been able to return to the problem more recently and I have a couple of further questions.
I’m going to present my problem below in as general a way as I can put it and then see if anyone has any suggestions. I hope my notation will be understandable.
Say I have some function f(\theta) which is very expensive to evaluate where \theta is a vector of parameters about which I have some uncertainty. The output of f is a vector of values y^*. Which are likely correlated with each other for a given set of parameters \theta. I have some test data, y, which is a vector which corresponds with the outputs of the function f. This test data has some uncertainty associated with it due to measurement error.
I have evaluated f at N different combinations of \theta (say on the order of 10-100 times as many function evaluations as different \theta s) and would like to use my model samples to construct a surrogate model which I can then use to update my uncertainties on my \theta s given my test data y since I can’t sample f directly.
I can see two approaches:
- Estimate the joint PDF of the N samples of f and then sample from this joint PDF where my likelyhood is formed from the sample from my PDF being the mean, my observations are my test data y, and the noise is the noise associated with those observations.
- Form a surrogate model which maps from my parameters \theta to y^* and form my likelyhood using my surrogate’s estimation of y^* as the mean.
I’ve used the first approach to some success but didn’t know how to construct a joint PDF from a collection of samples so I had to assume they were uncorrelated which is a bad assumption. Is it possible to estimate a joint PDF from samples in pymc?
For the second approach, I would want to calibrate the gaussian process (probably using ADVI) in one model and then sample from the surrogates in another model. Does this make sense or should I calibrate and sample all at once? Are there any examples out there were a previously calibrated GP is being sampled from in another model?
I hope my questions are clear.
After thinking about it some more, it seems like option 2 is the better choice. N is on the order of 100 for most situations which would be far less than what would be required to accurately estimate the joint distribution if I could figure out how to interpolate it.
I would, at some point, like to use an interpolation of the posterior as a prior distribution. In this example (Updating priors) the Interpolated class is used to create a new prior distribution. Is this a joint distribution or does Interpolated only produce a marginal distribution from the trace?
Since Interpolated is only univariate I suppose it could only be the marginal. Would extending the functionality to the multivariate case be too difficult?
Not sure I follow exactly, but what you describe in #2 sounds perfectly possible. Do you maybe have an example that’s not quite working yet?
I agree that option 2 is much more straight forward. I actually talked with @junpenglao about doing that in this thread: multidimensional gaussian process. I had misremembered the advice that I had received in that conversation and saw my mistake after asking…
I am anticipating that I will want to do some kind of multivariate interpolation in the future. That’s why I asked my follow-up question.
Oh I see. A multivariate interpolation of a PDF is sort of outside the scope of PyMC3 I think. Instead explicitly taking the posterior and using it as the next models prior, it is equivalent to just model both things simultaneously.
@bwengals I think having the capability to sample from a posterior trace as a prior would be useful though. Sometimes new data is obtained and it would be nice just to update a model, not refit.
Basically this: Feeding Posterior back in as Prior (updating a model) except including the interactions between variables which I think the approach as implemented in the simple example in the included link doesn’t do?
No it doesn’t look like it does. The mean field approximation is only the first moment, and the fullrank approximation is first and second moment. I think it would be difficult to do this properly for a nontrivial scenario, as exactly how the bias you’d incur from a suboptimal interpolation of the posterior-now-prior distribution affects the new posterior would be tough to quantify. This prior/posterior updating works nicely for conjugate distributions, but is certainly tougher using MCMC samples.
If the posterior of your first model is just one of the prior distributions of the second, and youre not just updating your entire model given new data, you could maybe set up a Gibbs sampling scheme where one step is drawing from your old posterior.
Interesting, I will have to look into that.
In my use case sometimes all of the information is not available in one dataset but may require several which become available over a significant passage of real-world time (weeks/months/years). In addition, fitting all of the datasets simultaneously may be computationally prohibitive given the size. If I could store a trace after fitting my model to one set and then use the results of the trace to update my prior distributions when I have new data that would be ideal.
Does that make sense? I don’t want to let my preconceived notion of what a good solution is to blind me to the actual best way forward.
It could make sense, depends on the scenario. I feel like it’s too hard to say without knowing specifics of your problem.