Outlier detection using posterior distributions

mschmidt87 · December 16, 2020, 7:46am

This is a more general question about a modeling idea that I had, not strictly related to PyMC3:

I’d like to do outlier (or novelty) detection on a dataset using the posterior distribution of a Bayesian model that I fit to the data.
My idea is essentially: to use a conventional outlier detection method, I need a vector for each sample in my dataset that I can then feed into the outlier detection method. Now, suppose I have a dataset consisting of time series data that has high intrinsic variability. Therefore I can’t use the data points directly as the vectors for the outlier detection.
My idea is therefore to first fit a Bayesian model to the data (e.g. some time series model) and then leverage the resulting posterior distributions to construct a feature vector for each data sample.

For instance, I could quantify the “otherness” of a data sample by computing the KL divergence of the posterior distributions to the other samples or to a group-level posterior distributions (in a hierarchical model).

Are there any references for such an approach?

Topic		Replies	Views
Theoretical and Practical Considerations and Questions v5 development , modeling , sampling	0	18	September 13, 2024
"Empirical Bayes" w/PyMC3 Questions	4	1557	August 18, 2018
How to make out-of-sample predictions with pymc model v5	1	610	February 8, 2023
Deploying Bayesian Models with PyMC3 Questions	2	1267	May 23, 2020
Problem using PyMC to estimate distribution of residuals v5 modeling	2	443	July 30, 2022

Outlier detection using posterior distributions

Related topics