Lets assume we have 5 datasets of 1k datapoints each and everyone.

We can fit quite complex models to these datasets since we have a vast amount of datapoints, lets say we fit some timeseries model incorporating trends, cyclical effects and regressor effects in a complex manner.

We also have a dataset with solely 30 datapoints for which we would like to perform some inference, we know that this dataset should exhibit the same characteristics as the larger ones since they stem from the same environment.

I do NOT want to create an hierarchical model but instead i want to create informative priors.

The question then arises, should we do this for the complex model that fits well with the large datasets and then roll it out on the small dataset, remember that the complex model might even have more parameters than the datapoints in the small dataset.

The other alternative is to do it the other way around, find an suitable model for the small dataset and then use that model on the larger datasets to create informative priors.

Do anyone have some pointers or ideas with respect to the two different options that could be of help?

What is your ultimate goal here? It’s difficult to know how to approach things if you aren’t entirely sure what you’re hoping to accomplish. If you are confident that the small and large data sets “should exhibit the same characteristics”, it’s not clear to me why you are entertaining the idea of using different models for the two settings. And the idea of “informative priors” is a bit unclear. How informative? Informative based on what?

Ultimate goal is to understand the time-varying effects on the smaller dataset.

The idea is not to use different models for the small and large datasets but instead the same one.

Since the large datasets has a lot of data we can fit complex models, models with more parameters than actual datapoints in the small dataset. We then construct our priors from our inference on the larger datasets and use them as informative priors on the small dataset.

The other idea is to use an model that is suitable for the small dataset(thus much less parameters than the one above) and do inference with it on the larger dataset as to construct priors for our rollout on the smaller dataset.

The priors are based on inference on the larger datasets.

Reusing prior can be tricky, because PyMC needs priors with closed form, and the posteriors we get are histograms.

There are some examples of reusing posteriors as priors by wrapping the draws in a kernel estimator. This example uses sequential learning, but the same principle would apply to reusing prior on different datasets: Updating priors — PyMC example gallery

You could also just take the mean estimate from the large dataset, and use it directly, or if you need uncertainty put some normal around it (or just use the closest normal fit to the dataset). This is the gist of what this fancier utility does: histogram_approximation — pymc_experimental 0.0.13 documentation

It’s also worth always keeping the boring option in mind: fit everything together. Complex effects shouldn’t be an issue for the smaller subsets if you model it in a way that allows for hierarchical regularization. Sometimes the speed gain from using posterior approximations as priors doesn’t overcome the precision losses. It’s definitely worth exploring (the downside being this is all very exploratory, even research wise)