Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saurabh Verma

Sid

The Llama 3 Herd of Models

Jul 31, 2024

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan(+521 more)

Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

Via

Access Paper or Ask Questions

Algorithm Selection for Deep Active Learning with Imbalanced Datasets

Feb 14, 2023

Jifan Zhang, Shuai Shao, Saurabh Verma, Robert Nowak

Abstract:Label efficiency has become an increasingly important objective in deep learning applications. Active learning aims to reduce the number of labeled examples needed to train deep networks, but the empirical performance of active learning algorithms can vary dramatically across datasets and applications. It is difficult to know in advance which active learning strategy will perform well or best in a given application. To address this, we propose the first adaptive algorithm selection strategy for deep active learning. For any unlabeled dataset, our (meta) algorithm TAILOR (Thompson ActIve Learning algORithm selection) iteratively and adaptively chooses among a set of candidate active learning algorithms. TAILOR uses novel reward functions aimed at gathering class-balanced examples. Extensive experiments in multi-class and multi-label applications demonstrate TAILOR's effectiveness in achieving accuracy comparable or better than that of the best of the candidate algorithms.

Via

Access Paper or Ask Questions

TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Feb 18, 2020

Nima Noorshams, Saurabh Verma, Aude Hofleitner

Figure 1 for TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Figure 2 for TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Figure 3 for TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Figure 4 for TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook

Abstract:Since its inception, Facebook has become an integral part of the online social community. People rely on Facebook to make connections with others and build communities. As a result, it is paramount to protect the integrity of such a rapidly growing network in a fast and scalable manner. In this paper, we present our efforts to protect various social media entities at Facebook from people who try to abuse our platform. We present a novel Temporal Interaction EmbeddingS (TIES) model that is designed to capture rogue social interactions and flag them for further suitable actions. TIES is a supervised, deep learning, production ready model at Facebook-scale networks. Prior works on integrity problems are mostly focused on capturing either only static or certain dynamic features of social entities. In contrast, TIES can capture both these variant behaviors in a unified model owing to the recent strides made in the domains of graph embedding and deep sequential pattern learning. To show the real-world impact of TIES, we present a few applications especially for preventing spread of misinformation, fake account detection, and reducing ads payment risks in order to enhance the platform's integrity.

* Submitted to KDD 2020 applied DS track

Via

Access Paper or Ask Questions

Physics-Guided Deep Neural Networks for PowerFlow Analysis

Jan 31, 2020

Xinyue Hu, Haoji Hu, Saurabh Verma, Zhi-Li Zhang

Figure 1 for Physics-Guided Deep Neural Networks for PowerFlow Analysis

Figure 2 for Physics-Guided Deep Neural Networks for PowerFlow Analysis

Figure 3 for Physics-Guided Deep Neural Networks for PowerFlow Analysis

Figure 4 for Physics-Guided Deep Neural Networks for PowerFlow Analysis

Abstract:Solving power flow (PF) equations is the basis of power flow analysis, which is important in determining the best operation of existing systems, performing security analysis, etc. However, PF equations can be out-of-date or even unavailable due to system dynamics and uncertainties, making traditional numerical approaches infeasible. To address these concerns, researchers have proposed data-driven approaches to solve the PF problem by learning the mapping rules from historical system operation data. Nevertheless, prior data-driven approaches suffer from poor performance and generalizability, due to overly simplified assumptions of the PF problem or ignorance of physical laws governing power systems. In this paper, we propose a physics-guided neural network to solve the PF problem, with an auxiliary task to rebuild the PF model. By encoding different granularity of Kirchhoff's laws and system topology into the rebuilt PF model, our neural-network based PF solver is regularized by the auxiliary task and constrained by the physical laws. The simulation results show that our physics-guided neural network methods achieve better performance and generalizability compared to existing unconstrained data-driven approaches. Furthermore, we demonstrate that the weight matrices of our physics-guided neural networks embody power system physics by showing their similarities with the bus admittance matrices.

* 8 pages

Via

Access Paper or Ask Questions

Deep Universal Graph Embedding Neural Network

Sep 25, 2019

Saurabh Verma, Zhi-Li Zhang

Figure 1 for Deep Universal Graph Embedding Neural Network

Figure 2 for Deep Universal Graph Embedding Neural Network

Figure 3 for Deep Universal Graph Embedding Neural Network

Figure 4 for Deep Universal Graph Embedding Neural Network

Abstract:Learning powerful data embeddings has become a center piece in machine learning, especially in natural language processing and computer vision domains. The crux of these embeddings is that they are pretrained on huge corpus of data in a unsupervised fashion, sometimes aided with transfer learning. However currently in the graph learning domain, embeddings learned through existing graph neural networks (GNNs) are task dependent and thus cannot be shared across different datasets. In this paper, we present a first powerful and theoretically guaranteed graph neural network that is designed to learn task-independent graph embeddings, thereafter referred to as deep universal graph embedding (DUGNN). Our DUGNN model incorporates a novel graph neural network (as a universal graph encoder) and leverages rich Graph Kernels (as a multi-task graph decoder) for both unsupervised learning and (task-specific) adaptive supervised learning. By learning task-independent graph embeddings across diverse datasets, DUGNN also reaps the benefits of transfer learning. Through extensive experiments and ablation studies, we show that the proposed DUGNN model consistently outperforms both the existing state-of-art GNN models and Graph Kernels by an increased accuracy of 3% - 8% on graph classification benchmark datasets.

Via

Access Paper or Ask Questions

A Fast-Optimal Guaranteed Algorithm For Learning Sub-Interval Relationships in Time Series

Jun 03, 2019

Saurabh Agrawal, Saurabh Verma, Anuj Karpatne, Stefan Liess, Snigdhansu Chatterjee, Vipin Kumar

Figure 1 for A Fast-Optimal Guaranteed Algorithm For Learning Sub-Interval Relationships in Time Series

Figure 2 for A Fast-Optimal Guaranteed Algorithm For Learning Sub-Interval Relationships in Time Series

Figure 3 for A Fast-Optimal Guaranteed Algorithm For Learning Sub-Interval Relationships in Time Series

Abstract:Traditional approaches focus on finding relationships between two entire time series, however, many interesting relationships exist in small sub-intervals of time and remain feeble during other sub-intervals. We define the notion of a sub-interval relationship (SIR) to capture such interactions that are prominent only in certain sub-intervals of time. To that end, we propose a fast-optimal guaranteed algorithm to find most interesting SIR relationship in a pair of time series. Lastly, we demonstrate the utility of our method in climate science domain based on a real-world dataset along with its scalability scope and obtain useful domain insights.

* Accepted at The Thirty-sixth International Conference on Machine Learning (ICML 2019), Time Series Workshop. arXiv admin note: substantial text overlap with arXiv:1802.06095

Via

Access Paper or Ask Questions

Stability and Generalization of Graph Convolutional Neural Networks

May 14, 2019

Saurabh Verma, Zhi-Li Zhang

Figure 1 for Stability and Generalization of Graph Convolutional Neural Networks

Figure 2 for Stability and Generalization of Graph Convolutional Neural Networks

Abstract:Inspired by convolutional neural networks on 1D and 2D data, graph convolutional neural networks (GCNNs) have been developed for various learning tasks on graph data, and have shown superior performance on real-world datasets. Despite their success, there is a dearth of theoretical explorations of GCNN models such as their generalization properties. In this paper, we take a first step towards developing a deeper theoretical understanding of GCNN models by analyzing the stability of single-layer GCNN models and deriving their generalization guarantees in a semi-supervised graph learning setting. In particular, we show that the algorithmic stability of a GCNN model depends upon the largest absolute eigenvalue of its graph convolution filter. Moreover, to ensure the uniform stability needed to provide strong generalization guarantees, the largest absolute eigenvalue must be independent of the graph size. Our results shed new insights on the design of new & improved graph convolution filters with guaranteed algorithmic stability. We evaluate the generalization gap and stability on various real-world graph datasets and show that the empirical results indeed support our theoretical findings. To the best of our knowledge, we are the first to study stability bounds on graph learning in a semi-supervised setting and derive generalization bounds for GCNN models.

* Accepted at The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019)

Via

Access Paper or Ask Questions

Graph Capsule Convolutional Neural Networks

Aug 26, 2018

Saurabh Verma, Zhi-Li Zhang

Figure 1 for Graph Capsule Convolutional Neural Networks

Figure 2 for Graph Capsule Convolutional Neural Networks

Figure 3 for Graph Capsule Convolutional Neural Networks

Abstract:Graph Convolutional Neural Networks (GCNNs) are the most recent exciting advancement in deep learning field and their applications are quickly spreading in multi-cross-domains including bioinformatics, chemoinformatics, social networks, natural language processing and computer vision. In this paper, we expose and tackle some of the basic weaknesses of a GCNN model with a capsule idea presented in \cite{hinton2011transforming} and propose our Graph Capsule Network (GCAPS-CNN) model. In addition, we design our GCAPS-CNN model to solve especially graph classification problem which current GCNN models find challenging. Through extensive experiments, we show that our proposed Graph Capsule Network can significantly outperforms both the existing state-of-art deep learning methods and graph kernels on graph classification benchmark datasets.

* Accepted at Joint ICML and IJCAI Workshop on Computational Biology, Stockholm, Sweden, 2018

Via

Access Paper or Ask Questions

Mining Sub-Interval Relationships In Time Series Data

Feb 16, 2018

Saurabh Agrawal, Saurabh Verma, Gowtham Atluri, Anuj Karpatne, Stefan Liess, Angus Macdonald III, Snigdhansu Chatterjee, Vipin Kumar

Figure 1 for Mining Sub-Interval Relationships In Time Series Data

Figure 2 for Mining Sub-Interval Relationships In Time Series Data

Figure 3 for Mining Sub-Interval Relationships In Time Series Data

Figure 4 for Mining Sub-Interval Relationships In Time Series Data

Abstract:Time-series data is being increasingly collected and stud- ied in several areas such as neuroscience, climate science, transportation, and social media. Discovery of complex patterns of relationships between individual time-series, using data-driven approaches can improve our understanding of real-world systems. While traditional approaches typically study relationships between two entire time series, many interesting relationships in real-world applications exist in small sub-intervals of time while remaining absent or feeble during other sub-intervals. In this paper, we define the notion of a sub-interval relationship (SIR) to capture inter- actions between two time series that are prominent only in certain sub-intervals of time. We propose a novel and efficient approach to find most interesting SIR in a pair of time series. We evaluate our proposed approach on two real-world datasets from climate science and neuroscience domain and demonstrated the scalability and computational efficiency of our proposed approach. We further evaluated our discovered SIRs based on a randomization based procedure. Our results indicated the existence of several such relationships that are statistically significant, some of which were also found to have physical interpretation.

Via

Access Paper or Ask Questions