Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Teresa Yeo

An Analysis of Model Robustness across Concurrent Distribution Shifts

Jan 08, 2025

Myeongho Jeon, Suhwan Choi, Hyoje Lee, Teresa Yeo

Abstract:Machine learning models, meticulously optimized for source data, often fail to predict target data when faced with distribution shifts (DSs). Previous benchmarking studies, though extensive, have mainly focused on simple DSs. Recognizing that DSs often occur in more complex forms in real-world scenarios, we broadened our study to include multiple concurrent shifts, such as unseen domain shifts combined with spurious correlations. We evaluated 26 algorithms that range from simple heuristic augmentations to zero-shot inference using foundation models, across 168 source-target pairs from eight datasets. Our analysis of over 100K models reveals that (i) concurrent DSs typically worsen performance compared to a single shift, with certain exceptions, (ii) if a model improves generalization for one distribution shift, it tends to be effective for others, and (iii) heuristic data augmentations achieve the best overall performance on both synthetic and real-world datasets.

* Accepted to TMLR

Via

Access Paper or Ask Questions

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Jul 24, 2024

Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, Amir Zamir

Figure 1 for ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Figure 2 for ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Figure 3 for ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Figure 4 for ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Abstract:Different users find different images generated for the same prompt desirable. This gives rise to personalized image generation which involves creating images aligned with an individual's visual preference. Current generative models are, however, unpersonalized, as they are tuned to produce outputs that appeal to a broad audience. Using them to generate images aligned with individual users relies on iterative manual prompt engineering by the user which is inefficient and undesirable. We propose to personalize the image generation process by first capturing the generic preferences of the user in a one-time process by inviting them to comment on a small selection of images, explaining why they like or dislike each. Based on these comments, we infer a user's structured liked and disliked visual attributes, i.e., their visual preference, using a large language model. These attributes are used to guide a text-to-image model toward producing images that are tuned towards the individual user's visual preference. Through a series of user studies and large language model guided evaluations, we demonstrate that the proposed method results in generations that are well aligned with individual users' visual preferences.

* Project page at https://viper.epfl.ch/

Via

Access Paper or Ask Questions

Controlled Training Data Generation with Diffusion Models

Mar 22, 2024

Teresa Yeo, Andrei Atanov, Harold Benoit, Aleksandr Alekseev, Ruchira Ray, Pooya Esmaeil Akhoondi, Amir Zamir

Abstract:In this work, we present a method to control a text-to-image generative model to produce training data specifically "useful" for supervised learning. Unlike previous works that employ an open-loop approach and pre-define prompts to generate new data using either a language model or human expertise, we develop an automated closed-loop system which involves two feedback mechanisms. The first mechanism uses feedback from a given supervised model and finds adversarial prompts that result in image generations that maximize the model loss. While these adversarial prompts result in diverse data informed by the model, they are not informed of the target distribution, which can be inefficient. Therefore, we introduce the second feedback mechanism that guides the generation process towards a certain target distribution. We call the method combining these two mechanisms Guided Adversarial Prompts. We perform our evaluations on different tasks, datasets and architectures, with different types of distribution shifts (spuriously correlated data, unseen domains) and demonstrate the efficiency of the proposed feedback mechanisms compared to open-loop approaches.

* Project page at https://adversarial-prompts.epfl.ch/

Via

Access Paper or Ask Questions

4M: Massively Multimodal Masked Modeling

Dec 11, 2023

David Mizrahi, Roman Bachmann, Oğuzhan Fatih Kar, Teresa Yeo, Mingfei Gao, Afshin Dehghan, Amir Zamir

Figure 1 for 4M: Massively Multimodal Masked Modeling

Figure 2 for 4M: Massively Multimodal Masked Modeling

Figure 3 for 4M: Massively Multimodal Masked Modeling

Figure 4 for 4M: Massively Multimodal Masked Modeling

Abstract:Current machine learning models for vision are often highly specialized and limited to a single modality and task. In contrast, recent large language models exhibit a wide range of capabilities, hinting at a possibility for similarly versatile models in computer vision. In this paper, we take a step in this direction and propose a multimodal training scheme called 4M. It consists of training a single unified Transformer encoder-decoder using a masked modeling objective across a wide range of input/output modalities - including text, images, geometric, and semantic modalities, as well as neural network feature maps. 4M achieves scalability by unifying the representation space of all modalities through mapping them into discrete tokens and performing multimodal masked modeling on a small randomized subset of tokens. 4M leads to models that exhibit several key capabilities: (1) they can perform a diverse set of vision tasks out of the box, (2) they excel when fine-tuned for unseen downstream tasks or new input modalities, and (3) they can function as a generative model that can be conditioned on arbitrary modalities, enabling a wide variety of expressive multimodal editing capabilities with remarkable flexibility. Through experimental analyses, we demonstrate the potential of 4M for training versatile and scalable foundation models for vision tasks, setting the stage for further exploration in multimodal learning for vision and other domains.

* NeurIPS 2023 Spotlight. Project page at https://4m.epfl.ch/

Via

Access Paper or Ask Questions

Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback

Sep 27, 2023

Teresa Yeo, Oğuzhan Fatih Kar, Zahra Sodagar, Amir Zamir

Abstract:We propose a method for adapting neural networks to distribution shifts at test-time. In contrast to training-time robustness mechanisms that attempt to anticipate and counter the shift, we create a closed-loop system and make use of a test-time feedback signal to adapt a network on the fly. We show that this loop can be effectively implemented using a learning-based function, which realizes an amortized optimizer for the network. This leads to an adaptation method, named Rapid Network Adaptation (RNA), that is notably more flexible and orders of magnitude faster than the baselines. Through a broad set of experiments using various adaptation signals and target tasks, we study the efficiency and flexibility of this method. We perform the evaluations using various datasets (Taskonomy, Replica, ScanNet, Hypersim, COCO, ImageNet), tasks (depth, optical flow, semantic segmentation, classification), and distribution shifts (Cross-datasets, 2D and 3D Common Corruptions) with promising results. We end with a discussion on general formulations for handling distribution shifts and our observations from comparing with similar approaches from other domains.

* Project website at https://rapid-network-adaptation.epfl.ch/

Via

Access Paper or Ask Questions

Task Discovery: Finding the Tasks that Neural Networks Generalize on

Dec 01, 2022

Andrei Atanov, Andrei Filatov, Teresa Yeo, Ajay Sohmshetty, Amir Zamir

Figure 1 for Task Discovery: Finding the Tasks that Neural Networks Generalize on

Figure 2 for Task Discovery: Finding the Tasks that Neural Networks Generalize on

Figure 3 for Task Discovery: Finding the Tasks that Neural Networks Generalize on

Figure 4 for Task Discovery: Finding the Tasks that Neural Networks Generalize on

Abstract:When developing deep learning models, we usually decide what task we want to solve then search for a model that generalizes well on the task. An intriguing question would be: what if, instead of fixing the task and searching in the model space, we fix the model and search in the task space? Can we find tasks that the model generalizes on? How do they look, or do they indicate anything? These are the questions we address in this paper. We propose a task discovery framework that automatically finds examples of such tasks via optimizing a generalization-based quantity called agreement score. We demonstrate that one set of images can give rise to many tasks on which neural networks generalize well. These tasks are a reflection of the inductive biases of the learning framework and the statistical patterns present in the data, thus they can make a useful tool for analysing the neural networks and their biases. As an example, we show that the discovered tasks can be used to automatically create adversarial train-test splits which make a model fail at test time, without changing the pixels or labels, but by only selecting how the datapoints should be split between the train and test sets. We end with a discussion on human-interpretability of the discovered tasks.

* NeurIPS 2022, Project page at https://taskdiscovery.epfl.ch

Via

Access Paper or Ask Questions

3D Common Corruptions and Data Augmentation

Apr 04, 2022

Oğuzhan Fatih Kar, Teresa Yeo, Andrei Atanov, Amir Zamir

Figure 1 for 3D Common Corruptions and Data Augmentation

Figure 2 for 3D Common Corruptions and Data Augmentation

Figure 3 for 3D Common Corruptions and Data Augmentation

Figure 4 for 3D Common Corruptions and Data Augmentation

Abstract:We introduce a set of image transformations that can be used as corruptions to evaluate the robustness of models as well as data augmentation mechanisms for training neural networks. The primary distinction of the proposed transformations is that, unlike existing approaches such as Common Corruptions, the geometry of the scene is incorporated in the transformations -- thus leading to corruptions that are more likely to occur in the real world. We also introduce a set of semantic corruptions (e.g. natural object occlusions). We show these transformations are `efficient' (can be computed on-the-fly), `extendable' (can be applied on most image datasets), expose vulnerability of existing models, and can effectively make models more robust when employed as `3D data augmentation' mechanisms. The evaluations on several tasks and datasets suggest incorporating 3D information into benchmarking and training opens up a promising direction for robustness research.

* CVPR 2022 (Oral). Project website at https://3dcommoncorruptions.epfl.ch/

Via

Access Paper or Ask Questions

Robustness via Cross-Domain Ensembles

Mar 19, 2021

Teresa Yeo, Oğuzhan Fatih Kar, Amir Zamir

Figure 1 for Robustness via Cross-Domain Ensembles

Figure 2 for Robustness via Cross-Domain Ensembles

Figure 3 for Robustness via Cross-Domain Ensembles

Figure 4 for Robustness via Cross-Domain Ensembles

Abstract:We present a method for making neural network predictions robust to shifts from the training data distribution. The proposed method is based on making predictions via a diverse set of cues (called 'middle domains') and ensembling them into one strong prediction. The premise of the idea is that predictions made via different cues respond differently to a distribution shift, hence one should be able to merge them into one robust final prediction. We perform the merging in a straightforward but principled manner based on the uncertainty associated with each prediction. The evaluations are performed using multiple tasks and datasets (Taskonomy, Replica, ImageNet, CIFAR) under a wide range of adversarial and non-adversarial distribution shifts which demonstrate the proposed method is considerably more robust than its standard learning counterpart, conventional deep ensembles, and several other baselines.

* Project website at https://crossdomain-ensembles.epfl.ch/

Via

Access Paper or Ask Questions

Robust Learning Through Cross-Task Consistency

Jun 07, 2020

Amir Zamir, Alexander Sax, Teresa Yeo, Oğuzhan Kar, Nikhil Cheerla, Rohan Suri, Zhangjie Cao, Jitendra Malik, Leonidas Guibas

Figure 1 for Robust Learning Through Cross-Task Consistency

Figure 2 for Robust Learning Through Cross-Task Consistency

Figure 3 for Robust Learning Through Cross-Task Consistency

Figure 4 for Robust Learning Through Cross-Task Consistency

Abstract:Visual perception entails solving a wide set of tasks, e.g., object detection, depth estimation, etc. The predictions made for multiple tasks from the same image are not independent, and therefore, are expected to be consistent. We propose a broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency. The proposed formulation is based on inference-path invariance over a graph of arbitrary tasks. We observe that learning with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs. This framework also leads to an informative unsupervised quantity, called Consistency Energy, based on measuring the intrinsic consistency of the system. Consistency Energy correlates well with the supervised error (r=0.67), thus it can be employed as an unsupervised confidence metric as well as for detection of out-of-distribution inputs (ROC-AUC=0.95). The evaluations are performed on multiple datasets, including Taskonomy, Replica, CocoDoom, and ApolloScape, and they benchmark cross-task consistency versus various baselines including conventional multi-task learning, cycle consistency, and analytical consistency.

* CVPR 2020 (Oral). Project website, models, live demo at http://consistency.epfl.ch/

Via

Access Paper or Ask Questions

Iterative Classroom Teaching

Nov 12, 2018

Teresa Yeo, Parameswaran Kamalaruban, Adish Singla, Arpit Merchant, Thibault Asselborn, Louis Faucon, Pierre Dillenbourg, Volkan Cevher

Figure 1 for Iterative Classroom Teaching

Figure 2 for Iterative Classroom Teaching

Figure 3 for Iterative Classroom Teaching

Abstract:We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom using O(min{d,N} log(1/eps)) examples, where d is the ambient dimension of the problem, N is the number of learners, and eps is the accuracy parameter. We show the robustness of our teaching strategy when the teacher has limited knowledge of the learners' internal dynamics as provided by a noisy oracle. Further, we study the trade-off between the learners' workload and the teacher's cost in teaching the target concept. Our experiments validate our theoretical results and suggest that appropriately partitioning the classroom into homogenous groups provides a balance between these two objectives.

* AAAI'19 (extended version)

Via

Access Paper or Ask Questions