Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mike Rabbat

EvalGIM: A Library for Evaluating Generative Image Models

Dec 18, 2024

Melissa Hall, Oscar Mañas, Reyhane Askari-Hemmat, Mark Ibrahim, Candace Ross, Pietro Astolfi, Tariq Berrada Ifriqi, Marton Havasi, Yohann Benchetrit, Karen Ullrich(+7 more)

Figure 1 for EvalGIM: A Library for Evaluating Generative Image Models

Figure 2 for EvalGIM: A Library for Evaluating Generative Image Models

Figure 3 for EvalGIM: A Library for Evaluating Generative Image Models

Figure 4 for EvalGIM: A Library for Evaluating Generative Image Models

Abstract:As the use of text-to-image generative models increases, so does the adoption of automatic benchmarking methods used in their evaluation. However, while metrics and datasets abound, there are few unified benchmarking libraries that provide a framework for performing evaluations across many datasets and metrics. Furthermore, the rapid introduction of increasingly robust benchmarking methods requires that evaluation libraries remain flexible to new datasets and metrics. Finally, there remains a gap in synthesizing evaluations in order to deliver actionable takeaways about model performance. To enable unified, flexible, and actionable evaluations, we introduce EvalGIM (pronounced ''EvalGym''), a library for evaluating generative image models. EvalGIM contains broad support for datasets and metrics used to measure quality, diversity, and consistency of text-to-image generative models. In addition, EvalGIM is designed with flexibility for user customization as a top priority and contains a structure that allows plug-and-play additions of new datasets and metrics. To enable actionable evaluation insights, we introduce ''Evaluation Exercises'' that highlight takeaways for specific evaluation questions. The Evaluation Exercises contain easy-to-use and reproducible implementations of two state-of-the-art evaluation methods of text-to-image generative models: consistency-diversity-realism Pareto Fronts and disaggregated measurements of performance disparities across groups. EvalGIM also contains Evaluation Exercises that introduce two new analysis methods for text-to-image generative models: robustness analyses of model rankings and balanced evaluations across different prompt styles. We encourage text-to-image model exploration with EvalGIM and invite contributions at https://github.com/facebookresearch/EvalGIM/.

* For code, see https://github.com/facebookresearch/EvalGIM/tree/main

Via

Access Paper or Ask Questions

Transformers Can Navigate Mazes With Multi-Step Prediction

Dec 06, 2024

Niklas Nolte, Ouail Kitouni, Adina Williams, Mike Rabbat, Mark Ibrahim

Figure 1 for Transformers Can Navigate Mazes With Multi-Step Prediction

Figure 2 for Transformers Can Navigate Mazes With Multi-Step Prediction

Figure 3 for Transformers Can Navigate Mazes With Multi-Step Prediction

Figure 4 for Transformers Can Navigate Mazes With Multi-Step Prediction

Abstract:Despite their remarkable success in language modeling, transformers trained to predict the next token in a sequence struggle with long-term planning. This limitation is particularly evident in tasks requiring foresight to plan multiple steps ahead such as maze navigation. The standard next single token prediction objective, however, offers no explicit mechanism to predict multiple steps ahead - or revisit the path taken so far. Consequently, in this work we study whether explicitly predicting multiple steps ahead (and backwards) can improve transformers' maze navigation. We train parameter-matched transformers from scratch, under identical settings, to navigate mazes of varying types and sizes with standard next token prediction and MLM-U, an objective explicitly predicting multiple steps ahead and backwards. We find that MLM-U considerably improves transformers' ability to navigate mazes compared to standard next token prediction across maze types and complexities. We also find MLM-U training is 4x more sample efficient and converges 2x faster in terms of GPU training hours relative to next token training. Finally, for more complex mazes we find MLM-U benefits from scaling to larger transformers. Remarkably, we find transformers trained with MLM-U outperform larger transformers trained with next token prediction using additional supervision from A* search traces. We hope these findings underscore the promise of learning objectives to advance transformers' capacity for long-term planning.

* 20 pages, 15 figures

Via

Access Paper or Ask Questions

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Jun 07, 2024

Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

Figure 1 for The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Figure 2 for The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Figure 3 for The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Figure 4 for The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Abstract:Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.

* 18 pages, 7 figures

Via

Access Paper or Ask Questions

DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Mar 21, 2024

Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

Figure 1 for DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Figure 2 for DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Figure 3 for DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Figure 4 for DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

Abstract:Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guarantees. Specifically, we assume access to a text-to-image diffusion model trained on a small amount of public data, and design a DP retrieval mechanism to augment the text prompt with samples retrieved from a private retrieval dataset. Our \emph{differentially private retrieval-augmented diffusion model} (DP-RDM) requires no fine-tuning on the retrieval dataset to adapt to another domain, and can use state-of-the-art generative models to generate high-quality image samples while satisfying rigorous DP guarantees. For instance, when evaluated on MS-COCO, our DP-RDM can generate samples with a privacy budget of $\epsilon=10$, while providing a $3.5$ point improvement in FID compared to public-only retrieval for up to $10,000$ queries.

Via

Access Paper or Ask Questions

Benchmarking Neural Network Training Algorithms

Jun 12, 2023

George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo(+15 more)

Figure 1 for Benchmarking Neural Network Training Algorithms

Figure 2 for Benchmarking Neural Network Training Algorithms

Figure 3 for Benchmarking Neural Network Training Algorithms

Figure 4 for Benchmarking Neural Network Training Algorithms

Abstract:Training algorithms, broadly construed, are an essential part of every deep learning pipeline. Training algorithm improvements that speed up training across a wide variety of workloads (e.g., better update rules, tuning protocols, learning rate schedules, or data selection schemes) could save time, save computational resources, and lead to better, more accurate, models. Unfortunately, as a community, we are currently unable to reliably identify training algorithm improvements, or even determine the state-of-the-art training algorithm. In this work, using concrete experiments, we argue that real progress in speeding up training requires new benchmarks that resolve three basic challenges faced by empirical comparisons of training algorithms: (1) how to decide when training is complete and precisely measure training time, (2) how to handle the sensitivity of measurements to exact workload details, and (3) how to fairly compare algorithms that require hyperparameter tuning. In order to address these challenges, we introduce a new, competitive, time-to-result benchmark using multiple workloads running on fixed hardware, the AlgoPerf: Training Algorithms benchmark. Our benchmark includes a set of workload variants that make it possible to detect benchmark submissions that are more robust to workload changes than current widely-used methods. Finally, we evaluate baseline submissions constructed using various optimizers that represent current practice, as well as other optimizers that have recently received attention in the literature. These baseline results collectively demonstrate the feasibility of our benchmark, show that non-trivial gaps between methods exist, and set a provisional state-of-the-art for future benchmark submissions to try and surpass.

* 102 pages, 8 figures, 41 tables

Via

Access Paper or Ask Questions

The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

Nov 08, 2022

Chuan Guo, Kamalika Chaudhuri, Pierre Stock, Mike Rabbat

Figure 1 for The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

Figure 2 for The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

Figure 3 for The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

Figure 4 for The Interpolated MVU Mechanism For Communication-efficient Private Federated Learning

Abstract:We consider private federated learning (FL), where a server aggregates differentially private gradient updates from a large number of clients in order to train a machine learning model. The main challenge is balancing privacy with both classification accuracy of the learned model as well as the amount of communication between the clients and server. In this work, we build on a recently proposed method for communication-efficient private FL -- the MVU mechanism -- by introducing a new interpolation mechanism that can accommodate a more efficient privacy analysis. The result is the new Interpolated MVU mechanism that provides SOTA results on communication-efficient private FL on a variety of datasets.

Via

Access Paper or Ask Questions

Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

May 30, 2022

Kiwan Maeng, Haiyu Lu, Luca Melis, John Nguyen, Mike Rabbat, Carole-Jean Wu

Figure 1 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Figure 2 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Figure 3 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Figure 4 for Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Abstract:Federated learning (FL) is an effective mechanism for data privacy in recommender systems by running machine learning model training on-device. While prior FL optimizations tackled the data and system heterogeneity challenges faced by FL, they assume the two are independent of each other. This fundamental assumption is not reflective of real-world, large-scale recommender systems -- data and system heterogeneity are tightly intertwined. This paper takes a data-driven approach to show the inter-dependence of data and system heterogeneity in real-world data and quantifies its impact on the overall model quality and fairness. We design a framework, RF^2, to model the inter-dependence and evaluate its impact on state-of-the-art model optimization techniques for federated recommendation tasks. We demonstrate that the impact on fairness can be severe under realistic heterogeneity scenarios, by up to 15.8--41x compared to a simple setup assumed in most (if not all) prior work. It means when realistic system-induced data heterogeneity is not properly modeled, the fairness impact of an optimization can be downplayed by up to 41x. The result shows that modeling realistic system-induced data heterogeneity is essential to achieving fair federated recommendation learning. We plan to open-source RF^2 to enable future design and evaluation of FL innovations.

Via

Access Paper or Ask Questions

Privacy-Aware Compression for Federated Data Analysis

Mar 15, 2022

Kamalika Chaudhuri, Chuan Guo, Mike Rabbat

Figure 1 for Privacy-Aware Compression for Federated Data Analysis

Figure 2 for Privacy-Aware Compression for Federated Data Analysis

Figure 3 for Privacy-Aware Compression for Federated Data Analysis

Figure 4 for Privacy-Aware Compression for Federated Data Analysis

Abstract:Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework are privacy, since user data is often sensitive, and compression, since the user devices have low network bandwidth. Prior work has addressed these challenges separately by combining standard compression algorithms with known privacy mechanisms. In this work, we take a holistic look at the problem and design a family of privacy-aware compression mechanisms that work for any given communication budget. We first propose a mechanism for transmitting a single real number that has optimal variance under certain conditions. We then show how to extend it to metric differential privacy for location privacy use-cases, as well as vectors, for application to federated learning. Our experiments illustrate that our mechanism can lead to better utility vs. compression trade-offs for the same privacy loss in a number of settings.

Via

Access Paper or Ask Questions

Papaya: Practical, Private, and Scalable Federated Learning

Nov 08, 2021

Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas(+4 more)

Figure 1 for Papaya: Practical, Private, and Scalable Federated Learning

Figure 2 for Papaya: Practical, Private, and Scalable Federated Learning

Figure 3 for Papaya: Practical, Private, and Scalable Federated Learning

Figure 4 for Papaya: Practical, Private, and Scalable Federated Learning

Abstract:Cross-device Federated Learning (FL) is a distributed learning paradigm with several challenges that differentiate it from traditional distributed learning, variability in the system characteristics on each device, and millions of clients coordinating with a central server being primary ones. Most FL systems described in the literature are synchronous - they perform a synchronized aggregation of model updates from individual clients. Scaling synchronous FL is challenging since increasing the number of clients training in parallel leads to diminishing returns in training speed, analogous to large-batch training. Moreover, stragglers hinder synchronous FL training. In this work, we outline a production asynchronous FL system design. Our work tackles the aforementioned issues, sketches of some of the system design challenges and their solutions, and touches upon principles that emerged from building a production FL system for millions of clients. Empirically, we demonstrate that asynchronous FL converges faster than synchronous FL when training across nearly one hundred million devices. In particular, in high concurrency settings, asynchronous FL is 5x faster and has nearly 8x less communication overhead than synchronous FL.

Via

Access Paper or Ask Questions

Sustainable AI: Environmental Implications, Challenges and Opportunities

Oct 30, 2021

Carole-Jean Wu, Ramya Raghavendra, Udit Gupta, Bilge Acun, Newsha Ardalani, Kiwan Maeng, Gloria Chang, Fiona Aga Behram, James Huang, Charles Bai(+15 more)

Figure 1 for Sustainable AI: Environmental Implications, Challenges and Opportunities

Figure 2 for Sustainable AI: Environmental Implications, Challenges and Opportunities

Figure 3 for Sustainable AI: Environmental Implications, Challenges and Opportunities

Figure 4 for Sustainable AI: Environmental Implications, Challenges and Opportunities

Abstract:This paper explores the environmental impact of the super-linear growth trends for AI from a holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the carbon footprint of AI computing by examining the model development cycle across industry-scale machine learning use cases and, at the same time, considering the life cycle of system hardware. Taking a step further, we capture the operational and manufacturing carbon footprint of AI computing and present an end-to-end analysis for what and how hardware-software design and at-scale optimization can help reduce the overall carbon footprint of AI. Based on the industry experience and lessons learned, we share the key challenges and chart out important development directions across the many dimensions of AI. We hope the key messages and insights presented in this paper can inspire the community to advance the field of AI in an environmentally-responsible manner.

Via

Access Paper or Ask Questions