To give you some context:
I am running an RL optimization scheme that involves two steps where the first step is where my question stems from:
-
Obtaining posterior parameter estimates for a nonlinear model that includes certain domain-specific and time-variant components
-
Taking a sample from the previously calculated posteriors to use as a ground truth when optimizing over an horizon.
This scheme will be deployed across a wide range of datasets, which can range from 30 to 1000 datapoints. The models have between 5 and 100 parameters, including or excluding certain time-variant components and/or domain-specific components.
In the worst case, a run with default sampling parameters(pymc4) takes approximately 6-7 hours on a Linux machine with no errors or warnings such as the standard blas warning.
Due to the various dynamics and sizes of the datasets, I need to run a model selection algorithm optimizing with respect to metrics such as MSE, AIC, LOO or BIC. However, due to the long runtimes for some of the datasets, it is intractable to run Bayesian optimization schemes optimizing for MSE, or running the pymc model selector and selecting model based on bic or aic.
My idea was to run a Bayes opt scheme, but instead of sampling the posterior, perform map-estimates to speed up the process. Then, once the “best” model is found, it can be run with a full Bayesian treatment.
I have also thought about running the model-selection scheme on a subset of the dataset but i might miss picking up on certain time-variant components that occurs with less frequency in such a scenario leading the selector to favorize models without such components.
I have noticed, though, that due to the random initialization in MAP-estimates, the results can differ a lot for the same model and the same dataset, leading to a lot of inconsistency. Additionally, I have seen various experienced writers in the forum not recommend running MAP.
Does anyone have any ideas on how to proceed with this? Thank you in advance.