Do we need a perfect ground-truth for benchmarking Internet traffic classifiers?

MR Oliveira, J Neves, R Valadas… - 2015 IEEE Conference …, 2015 - ieeexplore.ieee.org
2015 IEEE Conference on Computer Communications (INFOCOM), 2015ieeexplore.ieee.org
The classification of Internet traffic using supervised or semi-supervised statistical learning
techniques, both for anomaly detection and identification of Internet applications, has been
impaired by difficulties in obtaining a reliable ground-truth, required both to train the
classifier and to evaluate its performance. A perfect ground-truth is increasingly difficult, or
sometimes impossible, to obtain due to the growing percentage of cyphered traffic, the
sophistication of network attacks, and the constant updates of Internet applications. In this …
The classification of Internet traffic using supervised or semi-supervised statistical learning techniques, both for anomaly detection and identification of Internet applications, has been impaired by difficulties in obtaining a reliable ground-truth, required both to train the classifier and to evaluate its performance. A perfect ground-truth is increasingly difficult, or sometimes impossible, to obtain due to the growing percentage of cyphered traffic, the sophistication of network attacks, and the constant updates of Internet applications. In this paper, we study the impact of the ground-truth on training the classifier and estimating its performance measures. We show both theoretically and through simulation that ground-truth imperfections can severely bias the performance estimates. We then propose a latent class model that overcomes this problem by combining estimates of several classifiers over the same dataset. The model is evaluated using a high-quality dataset that includes the most representative Internet applications and network attacks. The results show that our latent class model produces very good performance estimates under mild levels of ground-truth imperfection, and can thus be used to correctly benchmark Internet traffic classifiers when only an imperfect ground-truth is available.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果