Skip to main navigation Skip to search Skip to main content

Less is more: Building selective anomaly ensembles

  • Stony Brook University

Research output: Contribution to journalArticlepeer-review

118 Scopus citations

Abstract

Ensemble learning for anomaly detection has been barely studied, due to difficulty in acquiring ground truth and the lack of inherent objective functions. In contrast, ensemble approaches for classification and clustering have been studied and effectively used for long. Our work taps into this gap and builds a new ensemble approach for anomaly detection, with application to event detection in temporal graphs as well as outlier detection in no-graph settings. It handles and combines multiple heterogeneous detectors to yield improved and robust performance. Importantly, trusting results from all the constituent detectors may deteriorate the overall performance of the ensemble, as some detectors could provide inaccurate results depending on the type of data in hand and the underlying assumptions of a detector. This suggests that combining the detectors selectively is key to building effective anomaly ensembles - hence "less is more". In this paper we propose a novel ensemble approach called SELECT for anomaly detection, which automatically and systematically selects the results from constituent detectors to combine in a fully unsupervised fashion. We apply our method to event detection in temporal graphs and outlier detection in multi-dimensional point data (no-graph), where SELECT successfully utilizes five base detectors and seven consensus methods under a unified ensemble framework. We provide extensive quantitative evaluation of our approach for event detection on five real-world datasets (four with ground truth events), including Enron email communications, Reality Mining SMS and phone call records, New York Times news corpus, and World Cup 2014 Twitter news feed. We also provide results for outlier detection on seven real-world multi-dimensional point datasets from UCI Machine Learning Repository. Thanks to its selection mechanism, SELECT yields superior performance compared to the individual detectors alone, the full ensemble (naively combining all results), an existing diversity-based ensemble, and an existing weighted ensemble approach.

Original languageEnglish
Article number42
JournalACM Transactions on Knowledge Discovery from Data
Volume10
Issue number4
DOIs
StatePublished - May 2016

Keywords

  • Anomaly ensembles
  • Anomaly mining
  • Dynamic graphs
  • Ensemble methods
  • Event detection
  • Rank aggregation
  • Unsupervised learning

Fingerprint

Dive into the research topics of 'Less is more: Building selective anomaly ensembles'. Together they form a unique fingerprint.

Cite this