Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeoggyun Kim

DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Feb 17, 2022

Jisung Park, Jeoggyun Kim, Yeseong Kim, Sungjin Lee, Onur Mutlu

Figure 1 for DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Figure 2 for DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Figure 3 for DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Figure 4 for DeepSketch: A New Machine Learning-Based Reference Search Technique for Post-Deduplication Delta Compression

Abstract:Data reduction in storage systems is becoming increasingly important as an effective solution to minimize the management cost of a data center. To maximize data-reduction efficiency, existing post-deduplication delta-compression techniques perform delta compression along with traditional data deduplication and lossless compression. Unfortunately, we observe that existing techniques achieve significantly lower data-reduction ratios than the optimal due to their limited accuracy in identifying similar data blocks. In this paper, we propose DeepSketch, a new reference search technique for post-deduplication delta compression that leverages the learning-to-hash method to achieve higher accuracy in reference search for delta compression, thereby improving data-reduction efficiency. DeepSketch uses a deep neural network to extract a data block's sketch, i.e., to create an approximate data signature of the block that can preserve similarity with other blocks. Our evaluation using eleven real-world workloads shows that DeepSketch improves the data-reduction ratio by up to 33% (21% on average) over a state-of-the-art post-deduplication delta-compression technique.

* Full paper to appear in USENIX FAST 2022

Via

Access Paper or Ask Questions