Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Tomkins

Substance or Style: What Does Your Image Embedding Know?

Jul 10, 2023

Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins

Abstract:Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.

* 27 pages, 9 figures

Via

Access Paper or Ask Questions

Approximating a RUM from Distributions on k-Slates

May 22, 2023

Flavio Chierichetti, Mirko Giacchini, Ravi Kumar, Alessandro Panconesi, Andrew Tomkins

Abstract:In this work we consider the problem of fitting Random Utility Models (RUMs) to user choices. Given the winner distributions of the subsets of size $k$ of a universe, we obtain a polynomial-time algorithm that finds the RUM that best approximates the given distribution on average. Our algorithm is based on a linear program that we solve using the ellipsoid method. Given that its corresponding separation oracle problem is NP-hard, we devise an approximate separation oracle that can be viewed as a generalization of the weighted feedback arc set problem to hypergraphs. Our theoretical result can also be made practical: we obtain a heuristic that is effective and scales to real-world datasets.

* Proceedings of The 26th International Conference on Artificial Intelligence and Statistics (AISTATS), 2023, pages 4757-4767, volume 206

Via

Access Paper or Ask Questions

CARLS: Cross-platform Asynchronous Representation Learning System

May 26, 2021

Chun-Ta Lu, Yun Zeng, Da-Cheng Juan, Yicheng Fan, Zhe Li, Jan Dlabal, Yi-Ting Chen, Arjun Gopalan, Allan Heydon, Chun-Sung Ferng(+6 more)

Figure 1 for CARLS: Cross-platform Asynchronous Representation Learning System

Figure 2 for CARLS: Cross-platform Asynchronous Representation Learning System

Figure 3 for CARLS: Cross-platform Asynchronous Representation Learning System

Figure 4 for CARLS: Cross-platform Asynchronous Representation Learning System

Abstract:In this work, we propose CARLS, a novel framework for augmenting the capacity of existing deep learning frameworks by enabling multiple components -- model trainers, knowledge makers and knowledge banks -- to concertedly work together in an asynchronous fashion across hardware platforms. The proposed CARLS is particularly suitable for learning paradigms where model training benefits from additional knowledge inferred or discovered during training, such as node embeddings for graph neural networks or reliable pseudo labels from model predictions. We also describe three learning paradigms -- semi-supervised learning, curriculum learning and multimodal learning -- as examples that can be scaled up efficiently by CARLS. One version of CARLS has been open-sourced and available for download at: https://github.com/tensorflow/neural-structured-learning/tree/master/research/carls

Via

Access Paper or Ask Questions

Graph Autoencoders with Deconvolutional Networks

Dec 22, 2020

Jia Li, Tomas Yu, Da-Cheng Juan, Arjun Gopalan, Hong Cheng, Andrew Tomkins

Figure 1 for Graph Autoencoders with Deconvolutional Networks

Figure 2 for Graph Autoencoders with Deconvolutional Networks

Figure 3 for Graph Autoencoders with Deconvolutional Networks

Figure 4 for Graph Autoencoders with Deconvolutional Networks

Abstract:Recent studies have indicated that Graph Convolutional Networks (GCNs) act as a \emph{low pass} filter in spectral domain and encode smoothed node representations. In this paper, we consider their opposite, namely Graph Deconvolutional Networks (GDNs) that reconstruct graph signals from smoothed node representations. We motivate the design of Graph Deconvolutional Networks via a combination of inverse filters in spectral domain and de-noising layers in wavelet domain, as the inverse operation results in a \emph{high pass} filter and may amplify the noise. Based on the proposed GDN, we further propose a graph autoencoder framework that first encodes smoothed graph representations with GCN and then decodes accurate graph signals with GDN. We demonstrate the effectiveness of the proposed method on several tasks including unsupervised graph-level representation , social recommendation and graph generation

Via

Access Paper or Ask Questions

Adversarial Robustness Across Representation Spaces

Dec 01, 2020

Pranjal Awasthi, George Yu, Chun-Sung Ferng, Andrew Tomkins, Da-Cheng Juan

Figure 1 for Adversarial Robustness Across Representation Spaces

Figure 2 for Adversarial Robustness Across Representation Spaces

Figure 3 for Adversarial Robustness Across Representation Spaces

Figure 4 for Adversarial Robustness Across Representation Spaces

Abstract:Adversarial robustness corresponds to the susceptibility of deep neural networks to imperceptible perturbations made at test time. In the context of image tasks, many algorithms have been proposed to make neural networks robust to adversarial perturbations made to the input pixels. These perturbations are typically measured in an $\ell_p$ norm. However, robustness often holds only for the specific attack used for training. In this work we extend the above setting to consider the problem of training of deep neural networks that can be made simultaneously robust to perturbations applied in multiple natural representation spaces. For the case of image data, examples include the standard pixel representation as well as the representation in the discrete cosine transform~(DCT) basis. We design a theoretically sound algorithm with formal guarantees for the above problem. Furthermore, our guarantees also hold when the goal is to require robustness with respect to multiple $\ell_p$ norm based attacks. We then derive an efficient practical implementation and demonstrate the effectiveness of our approach on standard datasets for image classification.

Via

Access Paper or Ask Questions

Surprise: Result List Truncation via Extreme Value Theory

Oct 19, 2020

Dara Bahri, Che Zheng, Yi Tay, Donald Metzler, Andrew Tomkins

Figure 1 for Surprise: Result List Truncation via Extreme Value Theory

Figure 2 for Surprise: Result List Truncation via Extreme Value Theory

Figure 3 for Surprise: Result List Truncation via Extreme Value Theory

Figure 4 for Surprise: Result List Truncation via Extreme Value Theory

Abstract:Work in information retrieval has largely been centered around ranking and relevance: given a query, return some number of results ordered by relevance to the user. The problem of result list truncation, or where to truncate the ranked list of results, however, has received less attention despite being crucial in a variety of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. Result list truncation can be challenging because relevance scores are often not well-calibrated. This is particularly true in large-scale IR systems where documents and queries are embedded in the same metric space and a query's nearest document neighbors are returned during inference. Here, relevance is inversely proportional to the distance between the query and candidate document, but what distance constitutes relevance varies from query to query and changes dynamically as more documents are added to the index. In this work, we propose Surprise scoring, a statistical method that leverages the Generalized Pareto distribution that arises in extreme value theory to produce interpretable and calibrated relevance scores at query time using nothing more than the ranked scores. We demonstrate its effectiveness on the result list truncation task across image, text, and IR datasets and compare it to both classical and recent baselines. We draw connections to hypothesis testing and $p$-values.

Via

Access Paper or Ask Questions

Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Aug 17, 2020

Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Cliff Brunk, Andrew Tomkins

Figure 1 for Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Figure 2 for Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Figure 3 for Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Figure 4 for Generative Models are Unsupervised Predictors of Page Quality: A Colossal-Scale Study

Abstract:Large generative language models such as GPT-2 are well-known for their ability to generate text as well as their utility in supervised downstream tasks via fine-tuning. Our work is twofold: firstly we demonstrate via human evaluation that classifiers trained to discriminate between human and machine-generated text emerge as unsupervised predictors of "page quality", able to detect low quality content without any training. This enables fast bootstrapping of quality indicators in a low-resource setting. Secondly, curious to understand the prevalence and nature of low quality pages in the wild, we conduct extensive qualitative and quantitative analysis over 500 million web articles, making this the largest-scale study ever conducted on the topic.

Via

Access Paper or Ask Questions

BusTr: Predicting Bus Travel Times from Real-Time Traffic

Jul 02, 2020

Richard Barnes, Senaka Buthpitiya, James Cook, Alex Fabrikant, Andrew Tomkins, Fangzhou Xu

Figure 1 for BusTr: Predicting Bus Travel Times from Real-Time Traffic

Figure 2 for BusTr: Predicting Bus Travel Times from Real-Time Traffic

Figure 3 for BusTr: Predicting Bus Travel Times from Real-Time Traffic

Figure 4 for BusTr: Predicting Bus Travel Times from Real-Time Traffic

Abstract:We present BusTr, a machine-learned model for translating road traffic forecasts into predictions of bus delays, used by Google Maps to serve the majority of the world's public transit systems where no official real-time bus tracking is provided. We demonstrate that our neural sequence model improves over DeepTTE, the state-of-the-art baseline, both in performance (-30% MAPE) and training stability. We also demonstrate significant generalization gains over simpler models, evaluated on longitudinal data to cope with a constantly evolving world.

* 14 pages, 2 figures, 5 tables. Citation: "Richard Barnes, Senaka Buthpitiya, James Cook, Alex Fabrikant, Andrew Tomkins, Fangzhou Xu (2020). BusTr: Predicting Bus Travel Times from Real-Time Traffic. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. doi: 10.1145/3394486.3403376"

Via

Access Paper or Ask Questions

Choppy: Cut Transformer For Ranked List Truncation

Apr 26, 2020

Dara Bahri, Yi Tay, Che Zheng, Donald Metzler, Andrew Tomkins

Figure 1 for Choppy: Cut Transformer For Ranked List Truncation

Figure 2 for Choppy: Cut Transformer For Ranked List Truncation

Figure 3 for Choppy: Cut Transformer For Ranked List Truncation

Abstract:Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user. However, the problem of determining how many results to return, i.e. how to optimally truncate the ranked result list, has received less attention despite being of critical importance in a range of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. In this work, we propose Choppy, an assumption-free model based on the widely successful Transformer architecture, to the ranked list truncation problem. Needing nothing more than the relevance scores of the results, the model uses a powerful multi-head attention mechanism to directly optimize any user-defined IR metric. We show Choppy improves upon recent state-of-the-art methods.

* SIGIR 2020

Via

Access Paper or Ask Questions

Reverse Engineering Configurations of Neural Text Generation Models

Apr 13, 2020

Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

Figure 1 for Reverse Engineering Configurations of Neural Text Generation Models

Figure 2 for Reverse Engineering Configurations of Neural Text Generation Models

Figure 3 for Reverse Engineering Configurations of Neural Text Generation Models

Abstract:This paper seeks to develop a deeper understanding of the fundamental properties of neural text generations models. The study of artifacts that emerge in machine generated text as a result of modeling choices is a nascent research area. Previously, the extent and degree to which these artifacts surface in generated text has not been well studied. In the spirit of better understanding generative text models and their artifacts, we propose the new task of distinguishing which of several variants of a given model generated a piece of text, and we conduct an extensive suite of diagnostic tests to observe whether modeling choices (e.g., sampling methods, top-$k$ probabilities, model architectures, etc.) leave detectable artifacts in the text they generate. Our key finding, which is backed by a rigorous set of experiments, is that such artifacts are present and that different modeling choices can be inferred by observing the generated text alone. This suggests that neural text generators may be more sensitive to various modeling choices than previously thought.

* ACL 2020

Via

Access Paper or Ask Questions