Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chun Zhang

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Feb 20, 2025

M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, Kang Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin(+85 more)

Abstract:Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

Via

Access Paper or Ask Questions

MGSA: Multi-granularity Graph Structure Attention for Knowledge Graph-to-Text Generation

Sep 16, 2024

Shanshan Wang, Chun Zhang, Ning Zhang

Abstract:The Knowledge Graph-to-Text Generation task aims to convert structured knowledge graphs into coherent and human-readable natural language text. Recent efforts in this field have focused on enhancing pre-trained language models (PLMs) by incorporating graph structure information to capture the intricate structure details of knowledge graphs. However, most of these approaches tend to capture only single-granularity structure information, concentrating either on the relationships between entities within the original graph or on the relationships between words within the same entity or across different entities. This narrow focus results in a significant limitation: models that concentrate solely on entity-level structure fail to capture the nuanced semantic relationships between words, while those that focus only on word-level structure overlook the broader relationships between original entire entities. To overcome these limitations, this paper introduces the Multi-granularity Graph Structure Attention (MGSA), which is based on PLMs. The encoder of the model architecture features an entity-level structure encoding module, a word-level structure encoding module, and an aggregation module that synthesizes information from both structure. This multi-granularity structure encoding approach allows the model to simultaneously capture both entity-level and word-level structure information, providing a more comprehensive understanding of the knowledge graph's structure information, thereby significantly improving the quality of the generated text. We conducted extensive evaluations of the MGSA model using two widely recognized KG-to-Text Generation benchmark datasets, WebNLG and EventNarrative, where it consistently outperformed models that rely solely on single-granularity structure information, demonstrating the effectiveness of our approach.

Via

Access Paper or Ask Questions

SLR: Learning Quadruped Locomotion without Privileged Information

Jun 07, 2024

Shiyi Chen, Zeyu Wan, Shiyang Yan, Chun Zhang, Weiyi Zhang, Qiang Li, Debing Zhang, Fasih Ud Din Farrukh

Figure 1 for SLR: Learning Quadruped Locomotion without Privileged Information

Figure 2 for SLR: Learning Quadruped Locomotion without Privileged Information

Figure 3 for SLR: Learning Quadruped Locomotion without Privileged Information

Figure 4 for SLR: Learning Quadruped Locomotion without Privileged Information

Abstract:Traditional reinforcement learning control for quadruped robots often relies on privileged information, demanding meticulous selection and precise estimation, thereby imposing constraints on the development process. This work proposes a Self-learning Latent Representation (SLR) method, which achieves high-performance control policy learning without the need for privileged information. To enhance the credibility of our proposed method's evaluation, SLR is compared with open-source code repositories of state-of-the-art algorithms, retaining the original authors' configuration parameters. Across four repositories, SLR consistently outperforms the reference results. Ultimately, the trained policy and encoder empower the quadruped robot to navigate steps, climb stairs, ascend rocks, and traverse various challenging terrains. Robot experiment videos are at https://11chens.github.io/SLR/

Via

Access Paper or Ask Questions

LP-SLAM: Language-Perceptive RGB-D SLAM system based on Large Language Model

Mar 17, 2023

Weiyi Zhang, Yushi Guo, Liting Niu, Peijun Li, Chun Zhang, Zeyu Wan, Jiaxiang Yan, Fasih Ud Din Farrukh, Debing Zhang

Abstract:Simultaneous localization and mapping (SLAM) is a critical technology that enables autonomous robots to be aware of their surrounding environment. With the development of deep learning, SLAM systems can achieve a higher level of perception of the environment, including the semantic and text levels. However, current works are limited in their ability to achieve a natural-language level of perception of the world. To address this limitation, we propose LP-SLAM, the first language-perceptive SLAM system that leverages large language models (LLMs). LP-SLAM has two major features: (a) it can detect text in the scene and determine whether it represents a landmark to be stored during the tracking and mapping phase, and (b) it can understand natural language input from humans and provide guidance based on the generated map. We illustrated three usages of the LLM in the system including text cluster, landmark judgment, and natural language navigation. Our proposed system represents an advancement in the field of LLMs based SLAM and opens up new possibilities for autonomous robots to interact with their environment in a more natural and intuitive way.

* 12 pages, 16 figures

Via

Access Paper or Ask Questions

A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Jul 14, 2022

Chengfeng Zhou, Songchang Chen, Chenming Xu, Jun Wang, Feng Liu, Chun Zhang, Juan Ye, Hefeng Huang, Dahong Qian

Figure 1 for A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Figure 2 for A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Figure 3 for A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Figure 4 for A simple normalization technique using window statistics to improve the out-of-distribution generalization on medical images

Abstract:Since data scarcity and data heterogeneity are prevailing for medical images, well-trained Convolutional Neural Networks (CNNs) using previous normalization methods may perform poorly when deployed to a new site. However, a reliable model for real-world clinical applications should be able to generalize well both on in-distribution (IND) and out-of-distribution (OOD) data (e.g., the new site data). In this study, we present a novel normalization technique called window normalization (WIN) to improve the model generalization on heterogeneous medical images, which is a simple yet effective alternative to existing normalization methods. Specifically, WIN perturbs the normalizing statistics with the local statistics computed on the window of features. This feature-level augmentation technique regularizes the models well and improves their OOD generalization significantly. Taking its advantage, we propose a novel self-distillation method called WIN-WIN for classification tasks. WIN-WIN is easily implemented with twice forward passes and a consistency constraint, which can be a simple extension for existing methods. Extensive experimental results on various tasks (6 tasks) and datasets (24 datasets) demonstrate the generality and effectiveness of our methods.

Via

Access Paper or Ask Questions

Long-term prediction of chaotic systems with recurrent neural networks

Mar 06, 2020

Huawei Fan, Junjie Jiang, Chun Zhang, Xingang Wang, Ying-Cheng Lai

Figure 1 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 2 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 3 for Long-term prediction of chaotic systems with recurrent neural networks

Figure 4 for Long-term prediction of chaotic systems with recurrent neural networks

Abstract:Reservoir computing systems, a class of recurrent neural networks, have recently been exploited for model-free, data-based prediction of the state evolution of a variety of chaotic dynamical systems. The prediction horizon demonstrated has been about half dozen Lyapunov time. Is it possible to significantly extend the prediction time beyond what has been achieved so far? We articulate a scheme incorporating time-dependent but sparse data inputs into reservoir computing and demonstrate that such rare "updates" of the actual state practically enable an arbitrarily long prediction horizon for a variety of chaotic systems. A physical understanding based on the theory of temporal synchronization is developed.

* 10 pages, 8 figures

Via

Access Paper or Ask Questions

Essence Knowledge Distillation for Speech Recognition

Jun 26, 2019

Zhenchuan Yang, Chun Zhang, Weibin Zhang, Jianxiu Jin, Dongpeng Chen

Figure 1 for Essence Knowledge Distillation for Speech Recognition

Figure 2 for Essence Knowledge Distillation for Speech Recognition

Figure 3 for Essence Knowledge Distillation for Speech Recognition

Figure 4 for Essence Knowledge Distillation for Speech Recognition

Abstract:It is well known that a speech recognition system that combines multiple acoustic models trained on the same data significantly outperforms a single-model system. Unfortunately, real time speech recognition using a whole ensemble of models is too computationally expensive. In this paper, we propose to distill the knowledge of essence in an ensemble of models (i.e. the teacher model) to a single model (i.e. the student model) that needs much less computation to deploy. Previously, all the soften outputs of the teacher model are used to optimize the student model. We argue that not all the outputs of the ensemble are necessary to be distilled. Some of the outputs may even contain noisy information that is useless or even harmful to the training of the student model. In addition, we propose to train the student model with a multitask learning approach by utilizing both the soften outputs of the teacher model and the correct hard labels. The proposed method achieves some surprising results on the Switchboard data set. When the student model is trained together with the correct labels and the essence knowledge from the teacher model, it not only significantly outperforms another single model with the same architecture that is trained only with the correct labels, but also consistently outperforms the teacher model that is used to generate the soft labels.

* 5 pages, 2 figures

Via

Access Paper or Ask Questions

A Bio-inspired Collision Detecotr for Small Quadcopter

Jan 14, 2018

Jiannan Zhao, Cheng Hu, Chun Zhang, Zhihua Wang, Shigang Yue

Figure 1 for A Bio-inspired Collision Detecotr for Small Quadcopter

Figure 2 for A Bio-inspired Collision Detecotr for Small Quadcopter

Figure 3 for A Bio-inspired Collision Detecotr for Small Quadcopter

Figure 4 for A Bio-inspired Collision Detecotr for Small Quadcopter

Abstract:Sense and avoid capability enables insects to fly versatilely and robustly in dynamic complex environment. Their biological principles are so practical and efficient that inspired we human imitating them in our flying machines. In this paper, we studied a novel bio-inspired collision detector and its application on a quadcopter. The detector is inspired from LGMD neurons in the locusts, and modeled into an STM32F407 MCU. Compared to other collision detecting methods applied on quadcopters, we focused on enhancing the collision selectivity in a bio-inspired way that can considerably increase the computing efficiency during an obstacle detecting task even in complex dynamic environment. We designed the quadcopter's responding operation imminent collisions and tested this bio-inspired system in an indoor arena. The observed results from the experiments demonstrated that the LGMD collision detector is feasible to work as a vision module for the quadcopter's collision avoidance task.

* 7 pages, 29 figures

Via

Access Paper or Ask Questions