Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michal Malka

ZipNN: Lossless Compression for AI Models

Nov 07, 2024

Moshik Hershcovitch, Andrew Wood, Leshem Choshen, Guy Girmonsky, Roy Leibovitz, Ilias Ennmouri, Michal Malka, Peter Chin, Swaminathan Sundararaman, Danny Harnik

Figure 1 for ZipNN: Lossless Compression for AI Models

Figure 2 for ZipNN: Lossless Compression for AI Models

Figure 3 for ZipNN: Lossless Compression for AI Models

Figure 4 for ZipNN: Lossless Compression for AI Models

Abstract:With the growth of model sizes and the scale of their deployment, their sheer size burdens the infrastructure requiring more network and more storage to accommodate these. While there is a vast model compression literature deleting parts of the model weights for faster inference, we investigate a more traditional type of compression - one that represents the model in a compact form and is coupled with a decompression algorithm that returns it to its original form and size - namely lossless compression. We present ZipNN a lossless compression tailored to neural networks. Somewhat surprisingly, we show that specific lossless compression can gain significant network and storage reduction on popular models, often saving 33% and at times reducing over 50% of the model size. We investigate the source of model compressibility and introduce specialized compression variants tailored for models that further increase the effectiveness of compression. On popular models (e.g. Llama 3) ZipNN shows space savings that are over 17% better than vanilla compression while also improving compression and decompression speeds by 62%. We estimate that these methods could save over an ExaByte per month of network traffic downloaded from a large model hub like Hugging Face.

* arXiv admin note: substantial text overlap with arXiv:2404.15198

Via

Access Paper or Ask Questions

DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Feb 14, 2022

Bruno Wassermann, David Ohana, Ronen Schaffer, Robert Shahla, Elliot K. Kolodner, Eran Raichstein, Michal Malka

Figure 1 for DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Figure 2 for DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Figure 3 for DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Figure 4 for DeCorus: Hierarchical Multivariate Anomaly Detection at Cloud-Scale

Abstract:Multivariate anomaly detection can be used to identify outages within large volumes of telemetry data for computing systems. However, developing an efficient anomaly detector that can provide users with relevant information is a challenging problem. We introduce our approach to hierarchical multivariate anomaly detection called DeCorus, a statistical multivariate anomaly detector which achieves linear complexity. It extends standard statistical techniques to improve their ability to find relevant anomalies within noisy signals and makes use of types of domain knowledge that system operators commonly possess to compute system-level anomaly scores. We describe the implementation of DeCorus an online log anomaly detection tool for network device syslog messages deployed at a cloud service provider. We use real-world data sets that consist of $1.5$ billion network device syslog messages and hundreds of incident tickets to characterize the performance of DeCorus and compare its ability to detect incidents with five alternative anomaly detectors. While DeCorus outperforms the other anomaly detectors, all of them are challenged by our data set. We share how DeCorus provides value in the field and how we plan to improve its incident detection accuracy.

* 11 pages, 4 figures, draft

Via

Access Paper or Ask Questions