Do you know existing accuracy metrics overrate time-series anomaly detections?

WS Hwang, JH Yun, J Kim, BG Min - Proceedings of the 37th ACM …, 2022 - dl.acm.org
WS Hwang, JH Yun, J Kim, BG Min
Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, 2022dl.acm.org
An anomaly coincides with a time range in time-series data, and a detection method usually
detects part of this range. Existing works assume that an expert can detect the whole
anomaly range by analyzing its detected part. Based on this expert scenario, a detection
method achieves a higher score if the expert can find more anomalies through predictions
given by the detection method as clues. However, the existing metrics overrate imprecise
and insufficient cases. The expert cannot detect any anomalies if a prediction indicates a …
An anomaly coincides with a time range in time-series data, and a detection method usually detects part of this range. Existing works assume that an expert can detect the whole anomaly range by analyzing its detected part. Based on this expert scenario, a detection method achieves a higher score if the expert can find more anomalies through predictions given by the detection method as clues. However, the existing metrics overrate imprecise and insufficient cases. The expert cannot detect any anomalies if a prediction indicates a wrong range that is unrelated with any anomalies (i.e., called an imprecise case). Moreover, they fail to detect an anomaly if a prediction indicates an insufficient anomaly range (i.e., called an insufficient case). For instance, it is difficult to understand an anomaly using a small part of the range if the anomaly has steadily changed from its original pattern over a period of time. Moreover, the existing metrics do not consider the length of incorrect predictions in their score though a prolonged incorrect prediction incoveniences the expert more than a shorter one. We deal with these problems through two concepts of cross-referencing and a weighting scheme. The cross-referencing verifies anomalies and predictions involved in imprecise and insufficient cases, preventing them from getting scores. By adopting the weighting scheme, we consider the weighted sum of scores given to predictions, wherein lengthy incorrect predictions are penalized. Based on these concepts, we propose novel metrics (i.e., eTaV and eTaff). We verify that the proposed metrics are more reasonable compared to existing metrics via evaluations using two real-world datasets, Secure Water Treatment and Hardware-in-the-Loop-based Augmented Industrial control system as well as some hypothetical datasets. Furthermore, our metrics have been verified in practice because they are employed in a anomaly detection competition (i.e., HAICon'211).
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果