I’d like to compare predictions from a few different models to predictions of a parameter-free model. In more detail:

I’m comparing various models of similarity judgments between pairs of words. So the DV is a likert scale judgment from 1 (the pair of words are not at all similar) to 7 (the pair of words are very similar). My parameter-free model is just the cosine similarity between two words’ word2vec embeddings – you can evaluate this model by just correlating the cosine with the likert scale judgments of similarity. The different models I’m fitting basically learn weights on the different dimensions of word2vec. I’d like to be able to compare out-of-sample prediction accuracy between ordinary cosine similarity and these various fitted models, and show that the fitted models predict more accurately. Previously I had been doing that through explicit cross-validation, and getting out-of-sample spearman correlations for each model. But of course, refitting the model for k folds is costly, so I’d rather avoid that. I think I might be able to use something like the Vehtari LOO procedure, but I would need the approximate out of sample predictions, rather than the LOO score that pm.loo() provides.

Is something like what I have in mind feasible with pymc3?