[PDF][PDF] Bitshred: Fast, scalable malware triage

J Jang, D Brumley… - Cylab, Carnegie Mellon …, 2010 - users.ece.cmu.edu
J Jang, D Brumley, S Venkataraman
Cylab, Carnegie Mellon University, Pittsburgh, PA, Technical Report …, 2010users.ece.cmu.edu
The sheer volume of new malware found each day is enormous. Worse, current trends show
the amount of malware is doubling each year. The large-scale volume has created a need
for automated large-scale triage techniques. Typical triage tasks include clustering malware
into families and finding the nearest neighbor to a given malware. In this paper we propose
efficient techniques for largescale malware triage. At the core of our work is BitShred, a
framework for data mining features extracted by existing per-sample malware analysis …
Abstract
The sheer volume of new malware found each day is enormous. Worse, current trends show the amount of malware is doubling each year. The large-scale volume has created a need for automated large-scale triage techniques. Typical triage tasks include clustering malware into families and finding the nearest neighbor to a given malware.
In this paper we propose efficient techniques for largescale malware triage. At the core of our work is BitShred, a framework for data mining features extracted by existing per-sample malware analysis. BitShred uses a probabilistic data structure created through feature hashing for large-scale correlation that is agnostic to per-sample malware analysis. BitShred then defines a fast variant of the Jaccard similarity metric to compare malware feature sets. We also develop a distributed version of BitShred that is optimal: given 2x more hardware, we get 2x the performance. After clustering, BitShred can go one step further than previous similar work and also automatically discover semantic inter-family and inter-malware distinguishing features, based upon co-clustering techniques adapted to BitShred’s fingerprints. We have implemented and evaluated BitShred using two different per-sample analysis routines: one based upon static code reuse detection and one based upon dynamic behavior analysis. Our evaluation show BitShred’s probabilistic data structure and algorithms speed up typical malware triage tasks by up to three orders of magnitude and use up to 82x less memory, all with similar accuracy to previous approaches.
users.ece.cmu.edu
以上显示的是最相近的搜索结果。 查看全部搜索结果