Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anup Rao

Hallucination Diversity-Aware Active Learning for Text Summarization

Apr 02, 2024

Yu Xia, Xu Liu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Anup Rao, Tung Mai, Shuai Li

Figure 1 for Hallucination Diversity-Aware Active Learning for Text Summarization

Figure 2 for Hallucination Diversity-Aware Active Learning for Text Summarization

Figure 3 for Hallucination Diversity-Aware Active Learning for Text Summarization

Figure 4 for Hallucination Diversity-Aware Active Learning for Text Summarization

Abstract:Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which limits their effectiveness in addressing various types of hallucinations exhibited in LLM outputs. To our best knowledge, in this paper we propose the first active learning framework to alleviate LLM hallucinations, reducing costly human annotations of hallucination needed. By measuring fine-grained hallucinations from errors in semantic frame, discourse and content verifiability in text summarization, we propose HAllucination Diversity-Aware Sampling (HADAS) to select diverse hallucinations for annotations in active learning for LLM finetuning. Extensive experiments on three datasets and different backbone models demonstrate advantages of our method in effectively and efficiently mitigating LLM hallucinations.

* Accepted to NAACL 2024

Via

Access Paper or Ask Questions

Decentralized Personalized Online Federated Learning

Nov 08, 2023

Renzhi Wu, Saayan Mitra, Xiang Chen, Anup Rao

Abstract:Vanilla federated learning does not support learning in an online environment, learning a personalized model on each client, and learning in a decentralized setting. There are existing methods extending federated learning in each of the three aspects. However, some important applications on enterprise edge servers (e.g. online item recommendation at global scale) involve the three aspects at the same time. Therefore, we propose a new learning setting \textit{Decentralized Personalized Online Federated Learning} that considers all the three aspects at the same time. In this new setting for learning, the first technical challenge is how to aggregate the shared model parameters from neighboring clients to obtain a personalized local model with good performance on each client. We propose to directly learn an aggregation by optimizing the performance of the local model with respect to the aggregation weights. This not only improves personalization of each local model but also helps the local model adapting to potential data shift by intelligently incorporating the right amount of information from its neighbors. The second challenge is how to select the neighbors for each client. We propose a peer selection method based on the learned aggregation weights enabling each client to select the most helpful neighbors and reduce communication cost at the same time. We verify the effectiveness and robustness of our proposed method on three real-world item recommendation datasets and one air quality prediction dataset.

* IEEE BigData 2023

Via

Access Paper or Ask Questions

Sample Constrained Treatment Effect Estimation

Oct 12, 2022

Raghavendra Addanki, David Arbour, Tung Mai, Cameron Musco, Anup Rao

Figure 1 for Sample Constrained Treatment Effect Estimation

Figure 2 for Sample Constrained Treatment Effect Estimation

Figure 3 for Sample Constrained Treatment Effect Estimation

Figure 4 for Sample Constrained Treatment Effect Estimation

Abstract:Treatment effect estimation is a fundamental problem in causal inference. We focus on designing efficient randomized controlled trials, to accurately estimate the effect of some treatment on a population of $n$ individuals. In particular, we study sample-constrained treatment effect estimation, where we must select a subset of $s \ll n$ individuals from the population to experiment on. This subset must be further partitioned into treatment and control groups. Algorithms for partitioning the entire population into treatment and control groups, or for choosing a single representative subset, have been well-studied. The key challenge in our setting is jointly choosing a representative subset and a partition for that set. We focus on both individual and average treatment effect estimation, under a linear effects model. We give provably efficient experimental designs and corresponding estimators, by identifying connections to discrepancy minimization and leverage-score-based sampling used in randomized numerical linear algebra. Our theoretical results obtain a smooth transition to known guarantees when $s$ equals the population size. We also empirically demonstrate the performance of our algorithms.

* Conference on Neural Information Processing Systems (NeurIPS) 2022

Via

Access Paper or Ask Questions

Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Jan 28, 2022

Nikhil Sheoran, Subrata Mitra, Vibhor Porwal, Siddharth Ghetia, Jatin Varshney, Tung Mai, Anup Rao, Vikas Maddukuri

Figure 1 for Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Figure 2 for Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Figure 3 for Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Figure 4 for Electra: Conditional Generative Model based Predicate-Aware Query Approximation

Abstract:The goal of Approximate Query Processing (AQP) is to provide very fast but "accurate enough" results for costly aggregate queries thereby improving user experience in interactive exploration of large datasets. Recently proposed Machine-Learning based AQP techniques can provide very low latency as query execution only involves model inference as compared to traditional query processing on database clusters. However, with increase in the number of filtering predicates(WHERE clauses), the approximation error significantly increases for these methods. Analysts often use queries with a large number of predicates for insights discovery. Thus, maintaining low approximation error is important to prevent analysts from drawing misleading conclusions. In this paper, we propose ELECTRA, a predicate-aware AQP system that can answer analytics-style queries with a large number of predicates with much smaller approximation errors. ELECTRA uses a conditional generative model that learns the conditional distribution of the data and at runtime generates a small (~1000 rows) but representative sample, on which the query is executed to compute the approximate result. Our evaluations with four different baselines on three real-world datasets show that ELECTRA provides lower AQP error for large number of predicates compared to baselines.

* To appear in Proceedings of AAAI 2022

Via

Access Paper or Ask Questions

Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Nov 29, 2021

Aravind Reddy, Ryan A. Rossi, Zhao Song, Anup Rao, Tung Mai, Nedim Lipka, Gang Wu, Eunyee Koh, Nesreen Ahmed

Figure 1 for Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Figure 2 for Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Figure 3 for Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Figure 4 for Online MAP Inference and Learning for Nonsymmetric Determinantal Point Processes

Abstract:In this paper, we introduce the online and streaming MAP inference and learning problems for Non-symmetric Determinantal Point Processes (NDPPs) where data points arrive in an arbitrary order and the algorithms are constrained to use a single-pass over the data as well as sub-linear memory. The online setting has an additional requirement of maintaining a valid solution at any point in time. For solving these new problems, we propose algorithms with theoretical guarantees, evaluate them on several real-world datasets, and show that they give comparable performance to state-of-the-art offline algorithms that store the entire data in memory and take multiple passes over it.

Via

Access Paper or Ask Questions

An Interpretable Graph Generative Model with Heterophily

Nov 04, 2021

Sudhanshu Chanpuriya, Ryan A. Rossi, Anup Rao, Tung Mai, Nedim Lipka, Zhao Song, Cameron Musco

Figure 1 for An Interpretable Graph Generative Model with Heterophily

Figure 2 for An Interpretable Graph Generative Model with Heterophily

Figure 3 for An Interpretable Graph Generative Model with Heterophily

Figure 4 for An Interpretable Graph Generative Model with Heterophily

Abstract:Many models for graphs fall under the framework of edge-independent dot product models. These models output the probabilities of edges existing between all pairs of nodes, and the probability of a link between two nodes increases with the dot product of vectors associated with the nodes. Recent work has shown that these models are unable to capture key structures in real-world graphs, particularly heterophilous structures, wherein links occur between dissimilar nodes. We propose the first edge-independent graph generative model that is a) expressive enough to capture heterophily, b) produces nonnegative embeddings, which allow link predictions to be interpreted in terms of communities, and c) optimizes effectively on real-world graphs with gradient descent on a cross-entropy loss. Our theoretical results demonstrate the expressiveness of our model in its ability to exactly reconstruct a graph using a number of clusters that is linear in the maximum degree, along with its ability to capture both heterophily and homophily in the data. Further, our experiments demonstrate the effectiveness of our model for a variety of important application tasks such as multi-label clustering and link prediction.

Via

Access Paper or Ask Questions

Multiscale Manifold Warping

Sep 19, 2021

Sridhar Mahadevan, Anup Rao, Georgios Theocharous, Jennifer Healey

Figure 1 for Multiscale Manifold Warping

Figure 2 for Multiscale Manifold Warping

Figure 3 for Multiscale Manifold Warping

Figure 4 for Multiscale Manifold Warping

Abstract:Many real-world applications require aligning two temporal sequences, including bioinformatics, handwriting recognition, activity recognition, and human-robot coordination. Dynamic Time Warping (DTW) is a popular alignment method, but can fail on high-dimensional real-world data where the dimensions of aligned sequences are often unequal. In this paper, we show that exploiting the multiscale manifold latent structure of real-world data can yield improved alignment. We introduce a novel framework called Warping on Wavelets (WOW) that integrates DTW with a a multi-scale manifold learning framework called Diffusion Wavelets. We present a theoretical analysis of the WOW family of algorithms and show that it outperforms previous state of the art methods, such as canonical time warping (CTW) and manifold warping, on several real-world datasets.

* 18 pages

Via

Access Paper or Ask Questions

Asymptotics of Ridge Regression in Convolutional Models

Mar 08, 2021

Mojtaba Sahraee-Ardakan, Tung Mai, Anup Rao, Ryan Rossi, Sundeep Rangan, Alyson K. Fletcher

Figure 1 for Asymptotics of Ridge Regression in Convolutional Models

Figure 2 for Asymptotics of Ridge Regression in Convolutional Models

Figure 3 for Asymptotics of Ridge Regression in Convolutional Models

Abstract:Understanding generalization and estimation error of estimators for simple models such as linear and generalized linear models has attracted a lot of attention recently. This is in part due to an interesting observation made in machine learning community that highly over-parameterized neural networks achieve zero training error, and yet they are able to generalize well over the test samples. This phenomenon is captured by the so called double descent curve, where the generalization error starts decreasing again after the interpolation threshold. A series of recent works tried to explain such phenomenon for simple models. In this work, we analyze the asymptotics of estimation error in ridge estimators for convolutional linear models. These convolutional inverse problems, also known as deconvolution, naturally arise in different fields such as seismology, imaging, and acoustics among others. Our results hold for a large class of input distributions that include i.i.d. features as a special case. We derive exact formulae for estimation error of ridge estimators that hold in a certain high-dimensional regime. We show the double descent phenomenon in our experiments for convolutional models and show that our theoretical results match the experiments.

Via

Access Paper or Ask Questions

Machine Unlearning via Algorithmic Stability

Feb 25, 2021

Enayat Ullah, Tung Mai, Anup Rao, Ryan Rossi, Raman Arora

Figure 1 for Machine Unlearning via Algorithmic Stability

Figure 2 for Machine Unlearning via Algorithmic Stability

Abstract:We study the problem of machine unlearning and identify a notion of algorithmic stability, Total Variation (TV) stability, which we argue, is suitable for the goal of exact unlearning. For convex risk minimization problems, we design TV-stable algorithms based on noisy Stochastic Gradient Descent (SGD). Our key contribution is the design of corresponding efficient unlearning algorithms, which are based on constructing a (maximal) coupling of Markov chains for the noisy SGD procedure. To understand the trade-offs between accuracy and unlearning efficiency, we give upper and lower bounds on excess empirical and populations risk of TV stable algorithms for convex risk minimization. Our techniques generalize to arbitrary non-convex functions, and our algorithms are differentially private as well.

Via

Access Paper or Ask Questions

Fundamental Tradeoffs in Distributionally Adversarial Training

Jan 15, 2021

Mohammad Mehrabi, Adel Javanmard, Ryan A. Rossi, Anup Rao, Tung Mai

Figure 1 for Fundamental Tradeoffs in Distributionally Adversarial Training

Figure 2 for Fundamental Tradeoffs in Distributionally Adversarial Training

Figure 3 for Fundamental Tradeoffs in Distributionally Adversarial Training

Abstract:Adversarial training is among the most effective techniques to improve the robustness of models against adversarial perturbations. However, the full effect of this approach on models is not well understood. For example, while adversarial training can reduce the adversarial risk (prediction error against an adversary), it sometimes increase standard risk (generalization error when there is no adversary). Even more, such behavior is impacted by various elements of the learning problem, including the size and quality of training data, specific forms of adversarial perturbations in the input, model overparameterization, and adversary's power, among others. In this paper, we focus on \emph{distribution perturbing} adversary framework wherein the adversary can change the test distribution within a neighborhood of the training data distribution. The neighborhood is defined via Wasserstein distance between distributions and the radius of the neighborhood is a measure of adversary's manipulative power. We study the tradeoff between standard risk and adversarial risk and derive the Pareto-optimal tradeoff, achievable over specific classes of models, in the infinite data limit with features dimension kept fixed. We consider three learning settings: 1) Regression with the class of linear models; 2) Binary classification under the Gaussian mixtures data model, with the class of linear classifiers; 3) Regression with the class of random features model (which can be equivalently represented as two-layer neural network with random first-layer weights). We show that a tradeoff between standard and adversarial risk is manifested in all three settings. We further characterize the Pareto-optimal tradeoff curves and discuss how a variety of factors, such as features correlation, adversary's power or the width of two-layer neural network would affect this tradeoff.

* 23 pages, 3 figures

Via

Access Paper or Ask Questions