Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Milad Alizadeh

On Leakage of Code Generation Evaluation Datasets

Jul 11, 2024

Alexandre Matton, Tom Sherborne, Dennis Aumiller, Elena Tommasone, Milad Alizadeh, Jingyi He, Raymond Ma, Maxime Voisin, Ellen Gilsenan-McMahon, Matthias Gallé

Figure 1 for On Leakage of Code Generation Evaluation Datasets

Figure 2 for On Leakage of Code Generation Evaluation Datasets

Figure 3 for On Leakage of Code Generation Evaluation Datasets

Figure 4 for On Leakage of Code Generation Evaluation Datasets

Abstract:In this paper we consider contamination by code generation test sets, in particular in their use in modern large language models. We discuss three possible sources of such contamination and show findings supporting each of them: (i) direct data leakage, (ii) indirect data leakage through the use of synthetic data and (iii) overfitting to evaluation sets during model selection. Key to our findings is a new dataset of 161 prompts with their associated python solutions, dataset which is released at https://huggingface.co/datasets/CohereForAI/lbpp .

* 4 main pages, 9 in total

Via

Access Paper or Ask Questions

Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Feb 16, 2022

Milad Alizadeh, Shyam A. Tailor, Luisa M Zintgraf, Joost van Amersfoort, Sebastian Farquhar, Nicholas Donald Lane, Yarin Gal

Figure 1 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 2 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 3 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Figure 4 for Prospect Pruning: Finding Trainable Weights at Initialization using Meta-Gradients

Abstract:Pruning neural networks at initialization would enable us to find sparse models that retain the accuracy of the original network while consuming fewer computational resources for training and inference. However, current methods are insufficient to enable this optimization and lead to a large degradation in model performance. In this paper, we identify a fundamental limitation in the formulation of current methods, namely that their saliency criteria look at a single step at the start of training without taking into account the trainability of the network. While pruning iteratively and gradually has been shown to improve pruning performance, explicit consideration of the training stage that will immediately follow pruning has so far been absent from the computation of the saliency criterion. To overcome the short-sightedness of existing methods, we propose Prospect Pruning (ProsPr), which uses meta-gradients through the first few steps of optimization to determine which weights to prune. ProsPr combines an estimate of the higher-order effects of pruning on the loss and the optimization trajectory to identify the trainable sub-network. Our method achieves state-of-the-art pruning performance on a variety of vision classification tasks, with less data and in a single shot compared to existing pruning-at-initialization methods.

Via

Access Paper or Ask Questions

COIN++: Data Agnostic Neural Compression

Jan 30, 2022

Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Goliński, Yee Whye Teh, Arnaud Doucet

Figure 1 for COIN++: Data Agnostic Neural Compression

Figure 2 for COIN++: Data Agnostic Neural Compression

Figure 3 for COIN++: Data Agnostic Neural Compression

Figure 4 for COIN++: Data Agnostic Neural Compression

Abstract:Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (such as pixel locations) to features (such as RGB values). Then, instead of storing the weights of the implicit neural representation directly, we store modulations applied to a meta-learned base network as a compressed code for the data. We further quantize and entropy code these modulations, leading to large compression gains while reducing encoding time by two orders of magnitude compared to baselines. We empirically demonstrate the effectiveness of our method by compressing various data modalities, from images to medical and climate data.

Via

Access Paper or Ask Questions

COIN: COmpression with Implicit Neural representations

Mar 03, 2021

Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet

Figure 1 for COIN: COmpression with Implicit Neural representations

Figure 2 for COIN: COmpression with Implicit Neural representations

Figure 3 for COIN: COmpression with Implicit Neural representations

Figure 4 for COIN: COmpression with Implicit Neural representations

Abstract:We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

Via

Access Paper or Ask Questions

Single Shot Structured Pruning Before Training

Jul 01, 2020

Joost van Amersfoort, Milad Alizadeh, Sebastian Farquhar, Nicholas Lane, Yarin Gal

Figure 1 for Single Shot Structured Pruning Before Training

Figure 2 for Single Shot Structured Pruning Before Training

Figure 3 for Single Shot Structured Pruning Before Training

Figure 4 for Single Shot Structured Pruning Before Training

Abstract:We introduce a method to speed up training by 2x and inference by 3x in deep neural networks using structured pruning applied before training. Unlike previous works on pruning before training which prune individual weights, our work develops a methodology to remove entire channels and hidden units with the explicit aim of speeding up training and inference. We introduce a compute-aware scoring mechanism which enables pruning in units of sensitivity per FLOP removed, allowing even greater speed ups. Our method is fast, easy to implement, and needs just one forward/backward pass on a single batch of data to complete pruning before training begins.

Via

Access Paper or Ask Questions

Gradient $\ell_1$ Regularization for Quantization Robustness

Feb 18, 2020

Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling

$Figure 1 for Gradient $\ell_1$ Regularization for Quantization Robustness$

$Figure 2 for Gradient $\ell_1$ Regularization for Quantization Robustness$

$Figure 3 for Gradient $\ell_1$ Regularization for Quantization Robustness$

$Figure 4 for Gradient $\ell_1$ Regularization for Quantization Robustness$

Abstract:We analyze the effect of quantizing weights and activations of neural networks on their loss and derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths as energy and memory requirements of the application change. Unlike quantization-aware training using the straight-through estimator that only targets a specific bit-width and requires access to training data and pipeline, our regularization-based method paves the way for "on the fly'' post-training quantization to various bit-widths. We show that by modeling quantization as a $\ell_\infty$-bounded perturbation, the first-order term in the loss expansion can be regularized using the $\ell_1$-norm of gradients. We experimentally validate the effectiveness of our regularization scheme on different architectures on CIFAR-10 and ImageNet datasets.

* ICLR 2020

Via

Access Paper or Ask Questions

A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Dec 22, 2019

Angelos Filos, Sebastian Farquhar, Aidan N. Gomez, Tim G. J. Rudner, Zachary Kenton, Lewis Smith, Milad Alizadeh, Arnoud de Kroon, Yarin Gal

Figure 1 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Figure 2 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Figure 3 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Figure 4 for A Systematic Comparison of Bayesian Deep Learning Robustness in Diabetic Retinopathy Tasks

Abstract:Evaluation of Bayesian deep learning (BDL) methods is challenging. We often seek to evaluate the methods' robustness and scalability, assessing whether new tools give `better' uncertainty estimates than old ones. These evaluations are paramount for practitioners when choosing BDL tools on-top of which they build their applications. Current popular evaluations of BDL methods, such as the UCI experiments, are lacking: Methods that excel with these experiments often fail when used in application such as medical or automotive, suggesting a pertinent need for new benchmarks in the field. We propose a new BDL benchmark with a diverse set of tasks, inspired by a real-world medical imaging application on \emph{diabetic retinopathy diagnosis}. Visual inputs (512x512 RGB images of retinas) are considered, where model uncertainty is used for medical pre-screening---i.e. to refer patients to an expert when model diagnosis is uncertain. Methods are then ranked according to metrics derived from expert-domain to reflect real-world use of model uncertainty in automated diagnosis. We develop multiple tasks that fall under this application, including out-of-distribution detection and robustness to distribution shift. We then perform a systematic comparison of well-tuned BDL techniques on the various tasks. From our comparison we conclude that some current techniques which solve benchmarks such as UCI `overfit' their uncertainty to the dataset---when evaluated on our benchmark these underperform in comparison to simpler baselines. The code for the benchmark, its baselines, and a simple API for evaluating new BDL tools are made available at https://github.com/oatml/bdl-benchmarks.

Via

Access Paper or Ask Questions