[PDF][PDF] {SiLo}: A {Similarity-Locality} based {Near-Exact} deduplication scheme with low {RAM} overhead and high throughput

W Xia, H Jiang, D Feng, Y Hua - 2011 USENIX Annual Technical …, 2011 - usenix.org
2011 USENIX Annual Technical Conference (USENIX ATC 11), 2011usenix.org
Data Deduplication is becoming increasingly popular in storage systems as a space-efficient
approach to data backup and archiving. Most existing state-of-the-art deduplication methods
are either locality based or similarity based, which, according to our analysis, do not work
adequately in many situations. While the former produces poor deduplication throughput
when there is little or no locality in datasets, the latter can fail to identify and thus remove
significant amounts of redundant data when there is a lack of similarity among files. In this …
Abstract
Data Deduplication is becoming increasingly popular in storage systems as a space-efficient approach to data backup and archiving. Most existing state-of-the-art deduplication methods are either locality based or similarity based, which, according to our analysis, do not work adequately in many situations. While the former produces poor deduplication throughput when there is little or no locality in datasets, the latter can fail to identify and thus remove significant amounts of redundant data when there is a lack of similarity among files. In this paper, we present SiLo, a near-exact deduplication system that effectively and complementarily exploits similarity and locality to achieve high duplicate elimination and throughput at extremely low RAM overheads. The main idea behind SiLo is to expose and exploit more similarity by grouping strongly correlated small files into a segment and segmenting large files, and to leverage locality in the backup stream by grouping contiguous segments into blocks to capture similar and duplicate data missed by the probabilistic similarity detection. By judiciously enhancing similarity through the exploitation of locality and vice versa, the SiLo approach is able to significantly reduce RAM usage for indexlookup and maintain a very high deduplication throughput. Our experimental evaluation of SiLo based on realworld datasets shows that the SiLo system consistently and significantly outperforms two existing state-of-theart system, one based on similarity and the other based on locality, under various workload conditions.
usenix.org
以上显示的是最相近的搜索结果。 查看全部搜索结果