Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sujith Ravi

Approximating 1-Wasserstein Distance with Trees

Jun 24, 2022

Makoto Yamada, Yuki Takezawa, Ryoma Sato, Han Bao, Zornitsa Kozareva, Sujith Ravi

Figure 1 for Approximating 1-Wasserstein Distance with Trees

Figure 2 for Approximating 1-Wasserstein Distance with Trees

Figure 3 for Approximating 1-Wasserstein Distance with Trees

Figure 4 for Approximating 1-Wasserstein Distance with Trees

Abstract:Wasserstein distance, which measures the discrepancy between distributions, shows efficacy in various types of natural language processing (NLP) and computer vision (CV) applications. One of the challenges in estimating Wasserstein distance is that it is computationally expensive and does not scale well for many distribution comparison tasks. In this paper, we aim to approximate the 1-Wasserstein distance by the tree-Wasserstein distance (TWD), where TWD is a 1-Wasserstein distance with tree-based embedding and can be computed in linear time with respect to the number of nodes on a tree. More specifically, we propose a simple yet efficient L1-regularized approach to learning the weights of the edges in a tree. To this end, we first show that the 1-Wasserstein approximation problem can be formulated as a distance approximation problem using the shortest path distance on a tree. We then show that the shortest path distance can be represented by a linear model and can be formulated as a Lasso-based regression problem. Owing to the convex formulation, we can obtain a globally optimal solution efficiently. Moreover, we propose a tree-sliced variant of these methods. Through experiments, we demonstrated that the weighted TWD can accurately approximate the original 1-Wasserstein distance.

Via

Access Paper or Ask Questions

Fixed Support Tree-Sliced Wasserstein Barycenter

Sep 08, 2021

Yuki Takezawa, Ryoma Sato, Zornitsa Kozareva, Sujith Ravi, Makoto Yamada

Figure 1 for Fixed Support Tree-Sliced Wasserstein Barycenter

Figure 2 for Fixed Support Tree-Sliced Wasserstein Barycenter

Figure 3 for Fixed Support Tree-Sliced Wasserstein Barycenter

Figure 4 for Fixed Support Tree-Sliced Wasserstein Barycenter

Abstract:The Wasserstein barycenter has been widely studied in various fields, including natural language processing, and computer vision. However, it requires a high computational cost to solve the Wasserstein barycenter problem because the computation of the Wasserstein distance requires a quadratic time with respect to the number of supports. By contrast, the Wasserstein distance on a tree, called the tree-Wasserstein distance, can be computed in linear time and allows for the fast comparison of a large number of distributions. In this study, we propose a barycenter under the tree-Wasserstein distance, called the fixed support tree-Wasserstein barycenter (FS-TWB) and its extension, called the fixed support tree-sliced Wasserstein barycenter (FS-TSWB). More specifically, we first show that the FS-TWB and FS-TSWB problems are convex optimization problems and can be solved by using the projected subgradient descent. Moreover, we propose a more efficient algorithm to compute the subgradient and objective function value by using the properties of tree-Wasserstein barycenter problems. Through real-world experiments, we show that, by using the proposed algorithm, the FS-TWB and FS-TSWB can be solved two orders of magnitude faster than the original Wasserstein barycenter.

Via

Access Paper or Ask Questions

Analyzing the Abstractiveness-Factuality Tradeoff With Nonlinear Abstractiveness Constraints

Aug 05, 2021

Markus Dreyer, Mengwen Liu, Feng Nan, Sandeep Atluri, Sujith Ravi

Figure 1 for Analyzing the Abstractiveness-Factuality Tradeoff With Nonlinear Abstractiveness Constraints

Figure 2 for Analyzing the Abstractiveness-Factuality Tradeoff With Nonlinear Abstractiveness Constraints

Figure 3 for Analyzing the Abstractiveness-Factuality Tradeoff With Nonlinear Abstractiveness Constraints

Figure 4 for Analyzing the Abstractiveness-Factuality Tradeoff With Nonlinear Abstractiveness Constraints

Abstract:We analyze the tradeoff between factuality and abstractiveness of summaries. We introduce abstractiveness constraints to control the degree of abstractiveness at decoding time, and we apply this technique to characterize the abstractiveness-factuality tradeoff across multiple widely-studied datasets, using extensive human evaluations. We train a neural summarization model on each dataset and visualize the rates of change in factuality as we gradually increase abstractiveness using our abstractiveness constraints. We observe that, while factuality generally drops with increased abstractiveness, different datasets lead to different rates of factuality decay. We propose new measures to quantify the tradeoff between factuality and abstractiveness, incl. muQAGS, which balances factuality with abstractiveness. We also quantify this tradeoff in previous works, aiming to establish baselines for the abstractiveness-factuality tradeoff that future publications can compare against.

Via

Access Paper or Ask Questions

Efficient Retrieval Optimized Multi-task Learning

Apr 20, 2021

Hengxin Fun, Sunil Gandhi, Sujith Ravi

Figure 1 for Efficient Retrieval Optimized Multi-task Learning

Figure 2 for Efficient Retrieval Optimized Multi-task Learning

Figure 3 for Efficient Retrieval Optimized Multi-task Learning

Figure 4 for Efficient Retrieval Optimized Multi-task Learning

Abstract:Recently, there have been significant advances in neural methods for tackling knowledge-intensive tasks such as open domain question answering (QA). These advances are fueled by combining large pre-trained language models with learnable retrieval of documents. Majority of these models use separate encoders for learning query representation, passage representation for the retriever and an additional encoder for the downstream task. Using separate encoders for each stage/task occupies a lot of memory and makes it difficult to scale to a large number of tasks. In this paper, we propose a novel Retrieval Optimized Multi-task (ROM) framework for jointly training self-supervised tasks, knowledge retrieval, and extractive question answering. Our ROM approach presents a unified and generalizable framework that enables scaling efficiently to multiple tasks, varying levels of supervision, and optimization choices such as different learning schedules without changing the model architecture. It also provides the flexibility of changing the encoders without changing the architecture of the system. Using our framework, we achieve comparable or better performance than recent methods on QA, while drastically reducing the number of parameters.

Via

Access Paper or Ask Questions

Transductive Learning for Abstractive News Summarization

Apr 17, 2021

Arthur Bražinskas, Mengwen Liu, Ramesh Nallapati, Sujith Ravi, Markus Dreyer

Figure 1 for Transductive Learning for Abstractive News Summarization

Figure 2 for Transductive Learning for Abstractive News Summarization

Figure 3 for Transductive Learning for Abstractive News Summarization

Figure 4 for Transductive Learning for Abstractive News Summarization

Abstract:Pre-trained language models have recently advanced abstractive summarization. These models are further fine-tuned on human-written references before summary generation in test time. In this work, we propose the first application of transductive learning to summarization. In this paradigm, a model can learn from the test set's input before inference. To perform transduction, we propose to utilize input document summarizing sentences to construct references for learning in test time. These sentences are often compressed and fused to form abstractive summaries and provide omitted details and additional context to the reader. We show that our approach yields state-of-the-art results on CNN/DM and NYT datasets. For instance, we achieve over 1 ROUGE-L point improvement on CNN/DM. Further, we show the benefits of transduction from older to more recent news. Finally, through human and automatic evaluation, we show that our summaries become more abstractive and coherent.

Via

Access Paper or Ask Questions

GoEmotions: A Dataset of Fine-Grained Emotions

Jun 03, 2020

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, Sujith Ravi

Figure 1 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 2 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 3 for GoEmotions: A Dataset of Fine-Grained Emotions

Figure 4 for GoEmotions: A Dataset of Fine-Grained Emotions

Abstract:Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations via Principal Preserved Component Analysis. We conduct transfer learning experiments with existing emotion benchmarks to show that our dataset generalizes well to other domains and different emotion taxonomies. Our BERT-based model achieves an average F1-score of .46 across our proposed taxonomy, leaving much room for improvement.

* Accepted to ACL 2020

Via

Access Paper or Ask Questions

Low-Dimensional Hyperbolic Knowledge Graph Embeddings

May 01, 2020

Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, Christopher Ré

Figure 1 for Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Figure 2 for Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Figure 3 for Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Figure 4 for Low-Dimensional Hyperbolic Knowledge Graph Embeddings

Abstract:Knowledge graph (KG) embeddings learn low-dimensional representations of entities and relations to predict missing facts. KGs often exhibit hierarchical and logical patterns which must be preserved in the embedding space. For hierarchical data, hyperbolic embedding methods have shown promise for high-fidelity and parsimonious representations. However, existing hyperbolic embedding methods do not account for the rich logical patterns in KGs. In this work, we introduce a class of hyperbolic KG embedding models that simultaneously capture hierarchical and logical patterns. Our approach combines hyperbolic reflections and rotations with attention to model complex relational patterns. Experimental results on standard KG benchmarks show that our method improves over previous Euclidean- and hyperbolic-based efforts by up to 6.1% in mean reciprocal rank (MRR) in low dimensions. Furthermore, we observe that different geometric transformations capture different types of relations while attention-based transformations generalize to multiple relations. In high dimensions, our approach yields new state-of-the-art MRRs of 49.6% on WN18RR and 57.7% on YAGO3-10.

Via

Access Paper or Ask Questions

ProFormer: Towards On-Device LSH Projection Based Transformers

Apr 13, 2020

Chinnadhurai Sankar, Sujith Ravi, Zornitsa Kozareva

Figure 1 for ProFormer: Towards On-Device LSH Projection Based Transformers

Figure 2 for ProFormer: Towards On-Device LSH Projection Based Transformers

Figure 3 for ProFormer: Towards On-Device LSH Projection Based Transformers

Figure 4 for ProFormer: Towards On-Device LSH Projection Based Transformers

Abstract:At the heart of text based neural models lay word representations, which are powerful but occupy a lot of memory making it challenging to deploy to devices with memory constraints such as mobile phones, watches and IoT. To surmount these challenges, we introduce ProFormer -- a projection based transformer architecture that is faster and lighter making it suitable to deploy to memory constraint devices and preserve user privacy. We use LSH projection layer to dynamically generate word representations on-the-fly without embedding lookup tables leading to significant memory footprint reduction from O(V.d) to O(T), where V is the vocabulary size, d is the embedding dimension size and T is the dimension of the LSH projection representation. We also propose a local projection attention (LPA) layer, which uses self-attention to transform the input sequence of N LSH word projections into a sequence of N/K representations reducing the computations quadratically by O(K^2). We evaluate ProFormer on multiple text classification tasks and observed improvements over prior state-of-the-art on-device approaches for short text classification and comparable performance for long text classification tasks. In comparison with a 2-layer BERT model, ProFormer reduced the embedding memory footprint from 92.16 MB to 1.3 KB and requires 16 times less computation overhead, which is very impressive making it the fastest and smallest on-device model.

Via

Access Paper or Ask Questions

Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Mar 12, 2020

Xin Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

Figure 1 for Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Figure 2 for Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Figure 3 for Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Figure 4 for Environment-agnostic Multitask Learning for Natural Language Grounded Navigation

Abstract:Recent research efforts enable study for natural language grounded navigation in photo-realistic environments, e.g., following natural language instructions or dialog. However, existing methods tend to overfit training data in seen environments and fail to generalize well in previously unseen environments. In order to close the gap between seen and unseen environments, we aim at learning a generalized navigation model from two novel perspectives: (1) we introduce a multitask navigation model that can be seamlessly trained on both Vision-Language Navigation (VLN) and Navigation from Dialog History (NDH) tasks, which benefits from richer natural language guidance and effectively transfers knowledge across tasks; (2) we propose to learn environment-agnostic representations for the navigation policy that are invariant among the environments seen during training, thus generalizing better on unseen environments. Extensive experiments show that training with environment-agnostic multitask learning objective significantly reduces the performance gap between seen and unseen environments and the navigation agent so trained outperforms the baselines on unseen environments by 16% (relative measure on success rate) on VLN and 120% (goal progress) on NDH. Our submission to the CVDN leaderboard establishes a new state-of-the-art for the NDH task outperforming the existing best model by more than 66% (goal progress) on the holdout test set. The code for training the navigation model using environment-agnostic multitask learning is available at https://github.com/google-research/valan.

Via

Access Paper or Ask Questions

Learning from a Teacher using Unlabeled Data

Nov 13, 2019

Gaurav Menghani, Sujith Ravi

Figure 1 for Learning from a Teacher using Unlabeled Data

Figure 2 for Learning from a Teacher using Unlabeled Data

Figure 3 for Learning from a Teacher using Unlabeled Data

Figure 4 for Learning from a Teacher using Unlabeled Data

Abstract:Knowledge distillation is a widely used technique for model compression. We posit that the teacher model used in a distillation setup, captures relationships between classes, that extend beyond the original dataset. We empirically show that a teacher model can transfer this knowledge to a student model even on an {\it out-of-distribution} dataset. Using this approach, we show promising results on MNIST, CIFAR-10, and Caltech-256 datasets using unlabeled image data from different sources. Our results are encouraging and help shed further light from the perspective of understanding knowledge distillation and utilizing unlabeled data to improve model quality.

Via

Access Paper or Ask Questions