Comparing to a parameter-free model?

Russell_Richie · September 30, 2019, 7:00pm

I’d like to compare predictions from a few different models to predictions of a parameter-free model. In more detail:

I’m comparing various models of similarity judgments between pairs of words. So the DV is a likert scale judgment from 1 (the pair of words are not at all similar) to 7 (the pair of words are very similar). My parameter-free model is just the cosine similarity between two words’ word2vec embeddings – you can evaluate this model by just correlating the cosine with the likert scale judgments of similarity. The different models I’m fitting basically learn weights on the different dimensions of word2vec. I’d like to be able to compare out-of-sample prediction accuracy between ordinary cosine similarity and these various fitted models, and show that the fitted models predict more accurately. Previously I had been doing that through explicit cross-validation, and getting out-of-sample spearman correlations for each model. But of course, refitting the model for k folds is costly, so I’d rather avoid that. I think I might be able to use something like the Vehtari LOO procedure, but I would need the approximate out of sample predictions, rather than the LOO score that pm.loo() provides.

Is something like what I have in mind feasible with pymc3?

sammosummo · September 30, 2019, 8:21pm

It might not be exactly what you’re looking for, but you could simulate “out-of-sample” data via a posterior predictive check (PPC). This gives you data simulated from the posterior predictive distribution. That is, per iteration of the PPC, you take a random draw from your unobserved stochastics, then take a random draw from the observed stochastics, creating a replica of your original data. Unsure how you compare this to your parameter-free model, perhaps you could calculate the pointwise difference from your real data, and (hopefully) this quantity will be smaller, on average, than the difference between your data and the prediction from the parameter-free model?

Russell_Richie · September 30, 2019, 8:30pm

Hi, @sammosummo. Thanks for the reply, but if I fit the model on a dataset and just do PPC on that same dataset, it’s not out of sample. What I had been doing was PPC through cross-validation – fitting on the training set, and then doing PPC on the held out data set, and doing this for every several test-train splits. But I’d like to avoid that because (a) it requires fitting the model k times (I’ve been doing k=5), which is slow, and (b) it doesn’t use all the data to fit the model, thus hurting the relative performance against the parameter-free model.

Topic		Replies	Views
Comparing Posterior Samples - Conceptual Question Questions	3	534	November 5, 2021
Comparing Models with Marginalized Latent Discrete Variables in PyMC v5 modeling	2	170	March 19, 2024
Model log likelihood Questions	0	426	March 26, 2021
Beginner question - Comparing two posterior predictive distributions with different number of observed data v5	8	674	July 12, 2023
Are my posterior predictive samples biased if I observe Y? Questions	4	689	April 20, 2021

Comparing to a parameter-free model?

Related topics