Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiyue Gao

Pandora: Towards General World Model with Natural Language Actions and Video States

Jun 12, 2024

Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi(+3 more)

Figure 1 for Pandora: Towards General World Model with Natural Language Actions and Video States

Figure 2 for Pandora: Towards General World Model with Natural Language Actions and Video States

Figure 3 for Pandora: Towards General World Model with Natural Language Actions and Video States

Figure 4 for Pandora: Towards General World Model with Natural Language Actions and Video States

Abstract:World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training.

* Website: https://world-model.maitrix.org/

Via

Access Paper or Ask Questions

LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

Apr 08, 2024

Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao(+2 more)

Abstract:Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and implementation of the diverse reasoning approaches for systematic comparison. This paper aims to close the gap: (1) We introduce AutoRace for fully automated reasoning chain evaluation. Existing metrics rely on expensive human annotations or pre-defined LLM prompts not adaptable to different tasks. In contrast, AutoRace automatically creates detailed evaluation criteria tailored for each task, and uses GPT-4 for accurate evaluation following the criteria. (2) We develop LLM Reasoners, a library for standardized modular implementation of existing and new reasoning algorithms, under a unified formulation of the search, reward, and world model components. With the new evaluation and library, (3) we conduct extensive study of different reasoning approaches (e.g., CoT, ToT, RAP). The analysis reveals interesting findings about different factors contributing to reasoning, including the reward-guidance, breadth-vs-depth in search, world model, and prompt formats, etc.

* Project website: https://www.llm-reasoners.net/

Via

Access Paper or Ask Questions

Extracting Mathematical Concepts with Large Language Models

Aug 29, 2023

Valeria de Paiva, Qiyue Gao, Pavel Kovalev, Lawrence S. Moss

Figure 1 for Extracting Mathematical Concepts with Large Language Models

Figure 2 for Extracting Mathematical Concepts with Large Language Models

Figure 3 for Extracting Mathematical Concepts with Large Language Models

Figure 4 for Extracting Mathematical Concepts with Large Language Models

Abstract:We extract mathematical concepts from mathematical text using generative large language models (LLMs) like ChatGPT, contributing to the field of automatic term extraction (ATE) and mathematical text processing, and also to the study of LLMs themselves. Our work builds on that of others in that we aim for automatic extraction of terms (keywords) in one mathematical field, category theory, using as a corpus the 755 abstracts from a snapshot of the online journal "Theory and Applications of Categories", circa 2020. Where our study diverges from previous work is in (1) providing a more thorough analysis of what makes mathematical term extraction a difficult problem to begin with; (2) paying close attention to inter-annotator disagreements; (3) providing a set of guidelines which both human and machine annotators could use to standardize the extraction process; (4) introducing a new annotation tool to help humans with ATE, applicable to any mathematical field and even beyond mathematics; (5) using prompts to ChatGPT as part of the extraction process, and proposing best practices for such prompts; and (6) raising the question of whether ChatGPT could be used as an annotator on the same level as human experts. Our overall findings are that the matter of mathematical ATE is an interesting field which can benefit from participation by LLMs, but LLMs themselves cannot at this time surpass human performance on it.

* 13 pages, 4 figures, presented to the 14th MathUI Workshop 2023

Via

Access Paper or Ask Questions

DISCO: Distilling Phrasal Counterfactuals with Large Language Models

Dec 20, 2022

Zeming Chen, Qiyue Gao, Kyle Richardson, Antoine Bosselut, Ashish Sabharwal

Abstract:Recent methods demonstrate that data augmentation using counterfactual knowledge can teach models the causal structure of a task, leading to robust and generalizable models. However, such counterfactual data often has a limited scale and diversity if crowdsourced and is computationally expensive to extend to new perturbation types if generated using supervised methods. To address this, we introduce a new framework called DISCO for automatically generating high-quality counterfactual data at scale. DISCO engineers prompts to generate phrasal perturbations with a large general language model. Then, a task-specific teacher model filters the generation to distill high-quality counterfactual data. We show that learning with this counterfactual data yields a comparatively small student model that is 6% (absolute) more robust and generalizes 5% better across distributions than baselines on various challenging evaluations. This model is also 15% more sensitive in differentiating original and counterfactual examples, on three evaluation sets written by human workers and via human-AI collaboration.

* work in progress

Via

Access Paper or Ask Questions

Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Apr 13, 2022

Zeming Chen, Qiyue Gao

Figure 1 for Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Figure 2 for Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Figure 3 for Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Figure 4 for Curriculum: A Broad-Coverage Benchmark for Linguistic Phenomena in Natural Language Understanding

Abstract:In the age of large transformer language models, linguistic evaluation play an important role in diagnosing models' abilities and limitations on natural language understanding. However, current evaluation methods show some significant shortcomings. In particular, they do not provide insight into how well a language model captures distinct linguistic skills essential for language understanding and reasoning. Thus they fail to effectively map out the aspects of language understanding that remain challenging to existing models, which makes it hard to discover potential limitations in models and datasets. In this paper, we introduce Curriculum as a new format of NLI benchmark for evaluation of broad-coverage linguistic phenomena. Curriculum contains a collection of datasets that covers 36 types of major linguistic phenomena and an evaluation procedure for diagnosing how well a language model captures reasoning skills for distinct types of linguistic phenomena. We show that this linguistic-phenomena-driven benchmark can serve as an effective tool for diagnosing model behavior and verifying model learning quality. In addition, Our experiments provide insight into the limitation of existing benchmark datasets and state-of-the-art models that may encourage future research on re-designing datasets, model architectures, and learning objectives.

* Accepted by NAACL 2022 (Main Conference)

Via

Access Paper or Ask Questions

Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Dec 03, 2021

Zeming Chen, Qiyue Gao

Figure 1 for Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Figure 2 for Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Figure 3 for Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Figure 4 for Probing Linguistic Information For Logical Inference In Pre-trained Language Models

Abstract:Progress in pre-trained language models has led to a surge of impressive results on downstream tasks for natural language understanding. Recent work on probing pre-trained language models uncovered a wide range of linguistic properties encoded in their contextualized representations. However, it is unclear whether they encode semantic knowledge that is crucial to symbolic inference methods. We propose a methodology for probing linguistic information for logical inference in pre-trained language model representations. Our probing datasets cover a list of linguistic phenomena required by major symbolic inference systems. We find that (i) pre-trained language models do encode several types of linguistic information for inference, but there are also some types of information that are weakly encoded, (ii) language models can effectively learn missing linguistic information through fine-tuning. Overall, our findings provide insights into which aspects of linguistic information for logical inference do language models and their pre-training procedures capture. Moreover, we have demonstrated language models' potential as semantic and background knowledge bases for supporting symbolic inference methods.

* Accepted in AAAI 2022

Via

Access Paper or Ask Questions

NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning

Jun 10, 2021

Zeming Chen, Qiyue Gao, Lawrence S. Moss

Figure 1 for NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning

Figure 2 for NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning

Figure 3 for NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning

Figure 4 for NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning

Abstract:Deep learning (DL) based language models achieve high performance on various benchmarks for Natural Language Inference (NLI). And at this time, symbolic approaches to NLI are receiving less attention. Both approaches (symbolic and DL) have their advantages and weaknesses. However, currently, no method combines them in a system to solve the task of NLI. To merge symbolic and deep learning methods, we propose an inference framework called NeuralLog, which utilizes both a monotonicity-based logical inference engine and a neural network language model for phrase alignment. Our framework models the NLI task as a classic search problem and uses the beam search algorithm to search for optimal inference paths. Experiments show that our joint logic and neural inference system improves accuracy on the NLI task and can achieve state-of-art accuracy on the SICK and MED datasets.

* 8 pages, 4 figures, The 10th Joint Conference on Lexical and Computational Semantics (*SEM2021) @ ACL2021

Via

Access Paper or Ask Questions

Monotonicity Marking from Universal Dependency Trees

May 12, 2021

Zeming Chen, Qiyue Gao

Figure 1 for Monotonicity Marking from Universal Dependency Trees

Figure 2 for Monotonicity Marking from Universal Dependency Trees

Figure 3 for Monotonicity Marking from Universal Dependency Trees

Figure 4 for Monotonicity Marking from Universal Dependency Trees

Abstract:Dependency parsing is a tool widely used in the field of Natural language processing and computational linguistics. However, there is hardly any work that connects dependency parsing to monotonicity, which is an essential part of logic and linguistic semantics. In this paper, we present a system that automatically annotates monotonicity information based on Universal Dependency parse trees. Our system utilizes surface-level monotonicity facts about quantifiers, lexical items, and token-level polarity information. We compared our system's performance with existing systems in the literature, including NatLog and ccg2mono, on a small evaluation dataset. Results show that our system outperforms NatLog and ccg2mono.

* 10 pages, 3 figures, The 14th International Conference on Computational Semantics (IWCS 2021)

Via

Access Paper or Ask Questions