Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lingling Zhang

PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning

Feb 17, 2025

Xinyu Zhang, Yuxuan Dong, Yanrui Wu, Jiaxing Huang, Chengyou Jia, Basura Fernando, Mike Zheng Shou, Lingling Zhang, Jun Liu

Abstract:Large language models demonstrate remarkable capabilities across various domains, especially mathematics and logic reasoning. However, current evaluations overlook physics-based reasoning - a complex task requiring physics theorems and constraints. We present PhysReason, a 1,200-problem benchmark comprising knowledge-based (25%) and reasoning-based (75%) problems, where the latter are divided into three difficulty levels (easy, medium, hard). Notably, problems require an average of 8.1 solution steps, with hard requiring 15.6, reflecting the complexity of physics-based reasoning. We propose the Physics Solution Auto Scoring Framework, incorporating efficient answer-level and comprehensive step-level evaluations. Top-performing models like Deepseek-R1, Gemini-2.0-Flash-Thinking, and o3-mini-high achieve less than 60% on answer-level evaluation, with performance dropping from knowledge questions (75.11%) to hard problems (31.95%). Through step-level evaluation, we identified four key bottlenecks: Physics Theorem Application, Physics Process Understanding, Calculation, and Physics Condition Analysis. These findings position PhysReason as a novel and comprehensive benchmark for evaluating physics-based reasoning capabilities in large language models. Our code and data will be published at https:/dxzxy12138.github.io/PhysReason.

Via

Access Paper or Ask Questions

DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Nov 26, 2024

Xinyu Zhang, Lingling Zhang, Yanrui Wu, Muye Huang, Wenjun Wu, Bo Li, Shaowei Wang, Jun Liu

Figure 1 for DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Figure 2 for DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Figure 3 for DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Figure 4 for DiagramQG: A Dataset for Generating Concept-Focused Questions from Diagrams

Abstract:Visual Question Generation (VQG) has gained significant attention due to its potential in educational applications. However, VQG researches mainly focus on natural images, neglecting diagrams in educational materials used to assess students' conceptual understanding. To address this gap, we introduce DiagramQG, a dataset containing 8,372 diagrams and 19,475 questions across various subjects. DiagramQG introduces concept and target text constraints, guiding the model to generate concept-focused questions for educational purposes. Meanwhile, we present the Hierarchical Knowledge Integration framework for Diagram Question Generation (HKI-DQG) as a strong baseline. This framework obtains multi-scale patches of diagrams and acquires knowledge using a visual language model with frozen parameters. It then integrates knowledge, text constraints and patches to generate concept-focused questions. We evaluate the performance of existing VQG models, open-source and closed-source vision-language models, and HKI-DQG on the DiagramQG dataset. Our HKI-DQG outperform existing methods, demonstrating that it serves as a strong baseline. Furthermore, to assess its generalizability, we apply HKI-DQG to two other VQG datasets of natural images, namely VQG-COCO and K-VQG, achieving state-of-the-art performance.The dataset and code are available at https://dxzxy12138.github.io/diagramqg-home.

Via

Access Paper or Ask Questions

Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Oct 27, 2024

Tianlin Guo, Lingling Zhang, Jiaxin Wang, Yuokuo Lei, Yifei Li, Haofen Wang, Jun Liu

Figure 1 for Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Figure 2 for Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Figure 3 for Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Figure 4 for Few-shot Open Relation Extraction with Gaussian Prototype and Adaptive Margin

Abstract:Few-shot relation extraction with none-of-the-above (FsRE with NOTA) aims at predicting labels in few-shot scenarios with unknown classes. FsRE with NOTA is more challenging than the conventional few-shot relation extraction task, since the boundaries of unknown classes are complex and difficult to learn. Meta-learning based methods, especially prototype-based methods, are the mainstream solutions to this task. They obtain the classification boundary by learning the sample distribution of each class. However, their performance is limited because few-shot overfitting and NOTA boundary confusion lead to misclassification between known and unknown classes. To this end, we propose a novel framework based on Gaussian prototype and adaptive margin named GPAM for FsRE with NOTA, which includes three modules, semi-factual representation, GMM-prototype metric learning and decision boundary learning. The first two modules obtain better representations to solve the few-shot problem through debiased information enhancement and Gaussian space distance measurement. The third module learns more accurate classification boundaries and prototypes through adaptive margin and negative sampling. In the training procedure of GPAM, we use contrastive learning loss to comprehensively consider the effects of range and margin on the classification of known and unknown classes to ensure the model's stability and robustness. Sufficient experiments and ablations on the FewRel dataset show that GPAM surpasses previous prototype methods and achieves state-of-the-art performance.

* 30 pages, 4 figures

Via

Access Paper or Ask Questions

GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

Sep 04, 2024

Lingling Zhang, Muye Huang, QianYing Wang, Yaxian Wang, Wenjun Wu, Jun Liu

Figure 1 for GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

Figure 2 for GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

Figure 3 for GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

Figure 4 for GoT-CQA: Graph-of-Thought Guided Compositional Reasoning for Chart Question Answering

Abstract:Chart Question Answering (CQA) aims at answering questions based on the visual chart content, which plays an important role in chart sumarization, business data analysis, and data report generation. CQA is a challenging multi-modal task because of the strong context dependence and complex reasoning requirement. The former refers to answering this question strictly based on the analysis of the visual content or internal data of the given chart, while the latter emphasizes the various logical and numerical reasoning involved in answer prediction process. In this paper, we pay more attention on the complex reasoning in CQA task, and propose a novel Graph-of-Thought (GoT) guided compositional reasoning model called GoT-CQA to overcome this problem. At first, we transform the chart-oriented question into a directed acyclic GoT composed of multiple operator nodes, including localization, numerical and logical operator. It intuitively reflects the human brain's solution process to this question. After that, we design an efficient auto-compositional reasoning framework guided by the GoT, to excute the multi-step reasoning operations in various types of questions. Comprehensive experiments on ChartQA and PlotQA-D datasets show that GoT-CQA achieves outstanding performance, especially in complex human-written and reasoning questions, comparing with the latest popular baselines.

Via

Access Paper or Ask Questions

VProChart: Answering Chart Question through Visual Perception Alignment Agent and Programmatic Solution Reasoning

Sep 03, 2024

Muye Huang, Lingling Zhang, Lai Han, Wenjun Wu, Xinyu Zhang, Jun Liu

Abstract:Charts are widely used for data visualization across various fields, including education, research, and business. Chart Question Answering (CQA) is an emerging task focused on the automatic interpretation and reasoning of data presented in charts. However, chart images are inherently difficult to interpret, and chart-related questions often involve complex logical and numerical reasoning, which hinders the performance of existing models. This paper introduces VProChart, a novel framework designed to address these challenges in CQA by integrating a lightweight Visual Perception Alignment Agent (VPAgent) and a Programmatic Solution Reasoning approach. VPAgent aligns and models chart elements based on principles of human visual perception, enhancing the understanding of chart context. The Programmatic Solution Reasoning approach leverages large language models (LLMs) to transform natural language reasoning questions into structured solution programs, facilitating precise numerical and logical reasoning. Extensive experiments on benchmark datasets such as ChartQA and PlotQA demonstrate that VProChart significantly outperforms existing methods, highlighting its capability in understanding and reasoning with charts.

Via

Access Paper or Ask Questions

EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Sep 03, 2024

Muye Huang, Lai Han, Xinyu Zhang, Wenjun Wu, Jie Ma, Lingling Zhang, Jun Liu

Figure 1 for EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Figure 2 for EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Figure 3 for EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Figure 4 for EvoChart: A Benchmark and a Self-Training Approach Towards Real-World Chart Understanding

Abstract:Chart understanding enables automated data analysis for humans, which requires models to achieve highly accurate visual comprehension. While existing Visual Language Models (VLMs) have shown progress in chart understanding, the lack of high-quality training data and comprehensive evaluation benchmarks hinders VLM chart comprehension. In this paper, we introduce EvoChart, a novel self-training method for generating synthetic chart data to enhance VLMs' capabilities in real-world chart comprehension. We also propose EvoChart-QA, a noval benchmark for measuring models' chart comprehension abilities in real-world scenarios. Specifically, EvoChart is a unique self-training data synthesis approach that simultaneously produces high-quality training corpus and a high-performance chart understanding model. EvoChart-QA consists of 650 distinct real-world charts collected from 140 different websites and 1,250 expert-curated questions that focus on chart understanding. Experimental results on various open-source and proprietary VLMs tested on EvoChart-QA demonstrate that even the best proprietary model, GPT-4o, achieves only 49.8% accuracy. Moreover, the EvoChart method significantly boosts the performance of open-source VLMs on real-world chart understanding tasks, achieving 54.2% accuracy on EvoChart-QA.

Via

Access Paper or Ask Questions

Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

Jan 12, 2024

Jiaxin Wang, Lingling Zhang, Jun Liu, Tianlin Guo, Wenjun Wu

Abstract:We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-defined relations to learn effective relational representations and how to determine the specific semantics of novel relations during classifying or clustering unlabeled instances. We then propose a novel framework, SFGRD, for this task to solve the above issues by learning from semi-factuals in two stages. The first stage is semi-factual generation implemented by a tri-view debiased relation representation module, in which we take each original sentence as the main view and design two debiased views to generate semi-factual examples for this sentence. The second stage is semi-factual thinking executed by a dual-space tri-view collaborative relation learning module, where we design a cluster-semantic space and a class-index space to learn relational semantics and relation label indices, respectively. In addition, we devise alignment and selection strategies to integrate two spaces and establish a self-supervised learning loop for unlabeled data by doing semi-factual thinking across three views. Extensive experimental results show that SFGRD surpasses state-of-the-art models in terms of accuracy by 2.36\% $\sim$5.78\% and cosine similarity by 32.19\%$\sim$ 84.45\% for relation label index and relation semantic quality, respectively. To the best of our knowledge, we are the first to exploit the efficacy of semi-factuals in relation extraction.

Via

Access Paper or Ask Questions

Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Jan 08, 2023

Fangzhi Xu, Jun Liu, Qika Lin, Tianzhe Zhao, Jian Zhang, Lingling Zhang

Figure 1 for Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Figure 2 for Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Figure 3 for Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Figure 4 for Mind Reasoning Manners: Enhancing Type Perception for Generalized Zero-shot Logical Reasoning over Text

Abstract:Logical reasoning task involves diverse types of complex reasoning over text, based on the form of multiple-choice question answering. Given the context, question and a set of options as the input, previous methods achieve superior performances on the full-data setting. However, the current benchmark dataset has the ideal assumption that the reasoning type distribution on the train split is close to the test split, which is inconsistent with many real application scenarios. To address it, there remain two problems to be studied: (1) How is the zero-shot capability of the models (train on seen types and test on unseen types)? (2) How to enhance the perception of reasoning types for the models? For problem 1, we propose a new benchmark for generalized zero-shot logical reasoning, named ZsLR. It includes six splits based on the three type sampling strategies. For problem 2, a type-aware model TaCo is proposed. It utilizes both the heuristic input reconstruction and the contrastive learning to improve the type perception in the global representation. Extensive experiments on both the zero-shot and full-data settings prove the superiority of TaCo over the state-of-the-art methods. Also, we experiment and verify the generalization capability of TaCo on other logical reasoning dataset.

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

GPTR: Gestalt-Perception Transformer for Diagram Object Detection

Dec 29, 2022

Xin Hu, Lingling Zhang, Jun Liu, Jinfu Fan, Yang You, Yaqiang Wu

Abstract:Diagram object detection is the key basis of practical applications such as textbook question answering. Because the diagram mainly consists of simple lines and color blocks, its visual features are sparser than those of natural images. In addition, diagrams usually express diverse knowledge, in which there are many low-frequency object categories in diagrams. These lead to the fact that traditional data-driven detection model is not suitable for diagrams. In this work, we propose a gestalt-perception transformer model for diagram object detection, which is based on an encoder-decoder architecture. Gestalt perception contains a series of laws to explain human perception, that the human visual system tends to perceive patches in an image that are similar, close or connected without abrupt directional changes as a perceptual whole object. Inspired by these thoughts, we build a gestalt-perception graph in transformer encoder, which is composed of diagram patches as nodes and the relationships between patches as edges. This graph aims to group these patches into objects via laws of similarity, proximity, and smoothness implied in these edges, so that the meaningful objects can be effectively detected. The experimental results demonstrate that the proposed GPTR achieves the best results in the diagram object detection task. Our model also obtains comparable results over the competitors in natural image object detection.

Via

Access Paper or Ask Questions

Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Jun 13, 2022

Lingling Zhang, Yanxiang Jiang, Fu-Chun Zheng, Mehdi Bennis, Xiaohu You

Figure 1 for Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Figure 2 for Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Figure 3 for Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Figure 4 for Computation Offloading and Resource Allocation in F-RANs: A Federated Deep Reinforcement Learning Approach

Abstract:The fog radio access network (F-RAN) is a promising technology in which the user mobile devices (MDs) can offload computation tasks to the nearby fog access points (F-APs). Due to the limited resource of F-APs, it is important to design an efficient task offloading scheme. In this paper, by considering time-varying network environment, a dynamic computation offloading and resource allocation problem in F-RANs is formulated to minimize the task execution delay and energy consumption of MDs. To solve the problem, a federated deep reinforcement learning (DRL) based algorithm is proposed, where the deep deterministic policy gradient (DDPG) algorithm performs computation offloading and resource allocation in each F-AP. Federated learning is exploited to train the DDPG agents in order to decrease the computing complexity of training process and protect the user privacy. Simulation results show that the proposed federated DDPG algorithm can achieve lower task execution delay and energy consumption of MDs more quickly compared with the other existing strategies.

* This paper has been accepted by IEEE ICC 2022

Via

Access Paper or Ask Questions