Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andres Codas

AgentInstruct: Toward Generative Teaching with Agentic Flows

Jul 03, 2024

Arindam Mitra, Luciano Del Corro, Guoqing Zheng, Shweti Mahajan, Dany Rouhana, Andres Codas, Yadong Lu, Wei-ge Chen, Olga Vrousgos, Corby Rosset(+4 more)

Figure 1 for AgentInstruct: Toward Generative Teaching with Agentic Flows

Figure 2 for AgentInstruct: Toward Generative Teaching with Agentic Flows

Figure 3 for AgentInstruct: Toward Generative Teaching with Agentic Flows

Figure 4 for AgentInstruct: Toward Generative Teaching with Agentic Flows

Abstract:Synthetic data is becoming increasingly important for accelerating the development of language models, both large and small. Despite several successful use cases, researchers also raised concerns around model collapse and drawbacks of imitating other models. This discrepancy can be attributed to the fact that synthetic data varies in quality and diversity. Effective use of synthetic data usually requires significant human effort in curating the data. We focus on using synthetic data for post-training, specifically creating data by powerful models to teach a new skill or behavior to another model, we refer to this setting as Generative Teaching. We introduce AgentInstruct, an extensible agentic framework for automatically creating large amounts of diverse and high-quality synthetic data. AgentInstruct can create both the prompts and responses, using only raw data sources like text documents and code files as seeds. We demonstrate the utility of AgentInstruct by creating a post training dataset of 25M pairs to teach language models different skills, such as text editing, creative writing, tool usage, coding, reading comprehension, etc. The dataset can be used for instruction tuning of any base model. We post-train Mistral-7b with the data. When comparing the resulting model Orca-3 to Mistral-7b-Instruct (which uses the same base model), we observe significant improvements across many benchmarks. For example, 40% improvement on AGIEval, 19% improvement on MMLU, 54% improvement on GSM8K, 38% improvement on BBH and 45% improvement on AlpacaEval. Additionally, it consistently outperforms other models such as LLAMA-8B-instruct and GPT-3.5-turbo.

Via

Access Paper or Ask Questions

Orca 2: Teaching Small Language Models How to Reason

Nov 21, 2023

Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agarwal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal(+5 more)

Figure 1 for Orca 2: Teaching Small Language Models How to Reason

Figure 2 for Orca 2: Teaching Small Language Models How to Reason

Figure 3 for Orca 2: Teaching Small Language Models How to Reason

Figure 4 for Orca 2: Teaching Small Language Models How to Reason

Abstract:Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs' reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at aka.ms/orca-lm to support research on the development, evaluation, and alignment of smaller LMs

* Added url to model weights fixed typo in Author name

Via

Access Paper or Ask Questions

Fault-Aware Neural Code Rankers

Jun 04, 2022

Jeevana Priya Inala, Chenglong Wang, Mei Yang, Andres Codas, Mark Encarnación, Shuvendu K Lahiri, Madanlal Musuvathi, Jianfeng Gao

Figure 1 for Fault-Aware Neural Code Rankers

Figure 2 for Fault-Aware Neural Code Rankers

Figure 3 for Fault-Aware Neural Code Rankers

Figure 4 for Fault-Aware Neural Code Rankers

Abstract:Large language models (LLMs) have demonstrated an impressive ability to generate code for various programming tasks. In many instances, LLMs can generate a correct program for a task when given numerous trials. Consequently, a recent trend is to do large scale sampling of programs using a model and then filtering/ranking the programs based on the program execution on a small number of known unit tests to select one candidate solution. However, these approaches assume that the unit tests are given and assume the ability to safely execute the generated programs (which can do arbitrary dangerous operations such as file manipulations). Both of the above assumptions are impractical in real-world software development. In this paper, we propose fault-aware neural code rankers that can predict the correctness of a sampled program without executing it. The fault-aware rankers are trained to predict different kinds of execution information such as predicting the exact compile/runtime error type (e.g., an IndexError or a TypeError). We show that our fault-aware rankers can significantly increase the pass@1 accuracy of various code generation models (including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.

Via

Access Paper or Ask Questions

S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process

Nov 08, 2021

Chulin Wang, Kyongmin Yeo, Xiao Jin, Andres Codas, Levente J. Klein, Bruce Elmegreen

Figure 1 for S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process

Figure 2 for S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process

Figure 3 for S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process

Figure 4 for S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process

Abstract:We present a super-resolution model for an advection-diffusion process with limited information. While most of the super-resolution models assume high-resolution (HR) ground-truth data in the training, in many cases such HR dataset is not readily accessible. Here, we show that a Recurrent Convolutional Network trained with physics-based regularizations is able to reconstruct the HR information without having the HR ground-truth data. Moreover, considering the ill-posed nature of a super-resolution problem, we employ the Recurrent Wasserstein Autoencoder to model the uncertainty.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Accelerating Physics-Based Simulations Using Neural Network Proxies: An Application in Oil Reservoir Modeling

May 23, 2019

Jiri Navratil, Alan King, Jesus Rios, Georgios Kollias, Ruben Torrado, Andres Codas

Figure 1 for Accelerating Physics-Based Simulations Using Neural Network Proxies: An Application in Oil Reservoir Modeling

Figure 2 for Accelerating Physics-Based Simulations Using Neural Network Proxies: An Application in Oil Reservoir Modeling

Figure 3 for Accelerating Physics-Based Simulations Using Neural Network Proxies: An Application in Oil Reservoir Modeling

Figure 4 for Accelerating Physics-Based Simulations Using Neural Network Proxies: An Application in Oil Reservoir Modeling

Abstract:We develop a proxy model based on deep learning methods to accelerate the simulations of oil reservoirs--by three orders of magnitude--compared to industry-strength physics-based PDE solvers. This paper describes a new architectural approach to this task, accompanied by a thorough experimental evaluation on a publicly available reservoir model. We demonstrate that in a practical setting a speedup of more than 2000X can be achieved with an average sequence error of about 10\% relative to the oil-field simulator. The proxy model is contrasted with a high-quality physics-based acceleration baseline and is shown to outperform it by several orders of magnitude. We believe the outcomes presented here are extremely promising and offer a valuable benchmark for continuing research in oil field development optimization. Due to its domain-agnostic architecture, the presented approach can be extended to many applications beyond the field of oil and gas exploration.

* 9 pages, submitted to FEED-2019 KDD Workshop & Frontiers in Big Data

Via

Access Paper or Ask Questions