Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qihan Ren

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

May 26, 2025

Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang(+8 more)

Abstract:Recent advances in large language models (LLMs) have enabled agents to autonomously perform complex, open-ended tasks. However, many existing frameworks depend heavily on manually predefined tools and workflows, which hinder their adaptability, scalability, and generalization across domains. In this work, we introduce Alita--a generalist agent designed with the principle of "Simplicity is the ultimate sophistication," enabling scalable agentic reasoning through minimal predefinition and maximal self-evolution. For minimal predefinition, Alita is equipped with only one component for direct problem-solving, making it much simpler and neater than previous approaches that relied heavily on hand-crafted, elaborate tools and workflows. This clean design enhances its potential to generalize to challenging questions, without being limited by tools. For Maximal self-evolution, we enable the creativity of Alita by providing a suite of general-purpose components to autonomously construct, refine, and reuse external capabilities by generating task-related model context protocols (MCPs) from open source, which contributes to scalable agentic reasoning. Notably, Alita achieves 75.15% pass@1 and 87.27% pass@3 accuracy, which is top-ranking among general-purpose agents, on the GAIA benchmark validation dataset, 74.00% and 52.00% pass@1, respectively, on Mathvista and PathVQA, outperforming many agent systems with far greater complexity. More details will be updated at $\href{https://github.com/CharlesQ9/Alita}{https://github.com/CharlesQ9/Alita}$.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions

Benchmarking Multimodal Knowledge Conflict for Large Multimodal Models

May 26, 2025

Yifan Jia, Kailin Jiang, Yuyang Liang, Qihan Ren, Yi Xin, Rui Yang, Fenze Feng, Mingcai Chen, Hengyang Lu, Haozhe Wang(+4 more)

Abstract:Large Multimodal Models(LMMs) face notable challenges when encountering multimodal knowledge conflicts, particularly under retrieval-augmented generation(RAG) frameworks where the contextual information from external sources may contradict the model's internal parametric knowledge, leading to unreliable outputs. However, existing benchmarks fail to reflect such realistic conflict scenarios. Most focus solely on intra-memory conflicts, while context-memory and inter-context conflicts remain largely investigated. Furthermore, commonly used factual knowledge-based evaluations are often overlooked, and existing datasets lack a thorough investigation into conflict detection capabilities. To bridge this gap, we propose MMKC-Bench, a benchmark designed to evaluate factual knowledge conflicts in both context-memory and inter-context scenarios. MMKC-Bench encompasses three types of multimodal knowledge conflicts and includes 1,573 knowledge instances and 3,381 images across 23 broad types, collected through automated pipelines with human verification. We evaluate three representative series of LMMs on both model behavior analysis and conflict detection tasks. Our findings show that while current LMMs are capable of recognizing knowledge conflicts, they tend to favor internal parametric knowledge over external evidence. We hope MMKC-Bench will foster further research in multimodal knowledge conflict and enhance the development of multimodal RAG systems. The source code is available at https://github.com/MLLMKCBENCH/MLLMKC.

* The source code is available at https://github.com/MLLMKCBENCH/MLLMKC

Via

Access Paper or Ask Questions

Towards the Dynamics of a DNN Learning Symbolic Interactions

Jul 27, 2024

Qihan Ren, Yang Xu, Junpeng Zhang, Yue Xin, Dongrui Liu, Quanshi Zhang

Abstract:This study proves the two-phase dynamics of a deep neural network (DNN) learning interactions. Despite the long disappointing view of the faithfulness of post-hoc explanation of a DNN, in recent years, a series of theorems have been proven to show that given an input sample, a small number of interactions between input variables can be considered as primitive inference patterns, which can faithfully represent every detailed inference logic of the DNN on this sample. Particularly, it has been observed that various DNNs all learn interactions of different complexities with two-phase dynamics, and this well explains how a DNN's generalization power changes from under-fitting to over-fitting. Therefore, in this study, we prove the dynamics of a DNN gradually encoding interactions of different complexities, which provides a theoretically grounded mechanism for the over-fitting of a DNN. Experiments show that our theory well predicts the real learning dynamics of various DNNs on different tasks.

Via

Access Paper or Ask Questions

Where We Have Arrived in Proving the Emergence of Sparse Symbolic Concepts in AI Models

May 03, 2023

Qihan Ren, Jiayang Gao, Wen Shen, Quanshi Zhang

Abstract:This paper aims to prove the emergence of symbolic concepts in well-trained AI models. We prove that if (1) the high-order derivatives of the model output w.r.t. the input variables are all zero, (2) the AI model can be used on occluded samples and will yield higher confidence when the input sample is less occluded, and (3) the confidence of the AI model does not significantly degrade on occluded samples, then the AI model will encode sparse interactive concepts. Each interactive concept represents an interaction between a specific set of input variables, and has a certain numerical effect on the inference score of the model. Specifically, it is proved that the inference score of the model can always be represented as the sum of the interaction effects of all interactive concepts. In fact, we hope to prove that conditions for the emergence of symbolic concepts are quite common. It means that for most AI models, we can usually use a small number of interactive concepts to mimic the model outputs on any arbitrarily masked samples.

Via

Access Paper or Ask Questions

Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Feb 25, 2023

Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, Quanshi Zhang

Figure 1 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 2 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 3 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Figure 4 for Bayesian Neural Networks Tend to Ignore Complex and Sensitive Concepts

Abstract:In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability.

Via

Access Paper or Ask Questions

Discovering and Explaining the Representation Bottleneck of DNNs

Nov 18, 2021

Huiqi Deng, Qihan Ren, Xu Chen, Hao Zhang, Jie Ren, Quanshi Zhang

Figure 1 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 2 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 3 for Discovering and Explaining the Representation Bottleneck of DNNs

Figure 4 for Discovering and Explaining the Representation Bottleneck of DNNs

Abstract:This paper explores the bottleneck of feature representations of deep neural networks (DNNs), from the perspective of the complexity of interactions between input variables encoded in DNNs. To this end, we focus on the multi-order interaction between input variables, where the order represents the complexity of interactions. We discover that a DNN is more likely to encode both too simple interactions and too complex interactions, but usually fails to learn interactions of intermediate complexity. Such a phenomenon is widely shared by different DNNs for different tasks. This phenomenon indicates a cognition gap between DNNs and human beings, and we call it a representation bottleneck. We theoretically prove the underlying reason for the representation bottleneck. Furthermore, we propose a loss to encourage/penalize the learning of interactions of specific complexities, and analyze the representation capacities of interactions of different complexities.

Via

Access Paper or Ask Questions

Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Nov 05, 2021

Wen Shen, Qihan Ren, Dongrui Liu, Quanshi Zhang

Figure 1 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 2 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 3 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Figure 4 for Interpreting Representation Quality of DNNs for 3D Point Cloud Processing

Abstract:In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training.

Via

Access Paper or Ask Questions

Rotation-Equivariant Neural Networks for Privacy Protection

Jun 21, 2020

Hao Zhang, Yiting Chen, Haotian Ma, Xu Cheng, Qihan Ren, Liyao Xiang, Jie Shi, Quanshi Zhang

Figure 1 for Rotation-Equivariant Neural Networks for Privacy Protection

Figure 2 for Rotation-Equivariant Neural Networks for Privacy Protection

Figure 3 for Rotation-Equivariant Neural Networks for Privacy Protection

Figure 4 for Rotation-Equivariant Neural Networks for Privacy Protection

Abstract:In order to prevent leaking input information from intermediate-layer features, this paper proposes a method to revise the traditional neural network into the rotation-equivariant neural network (RENN). Compared to the traditional neural network, the RENN uses d-ary vectors/tensors as features, in which each element is a d-ary number. These d-ary features can be rotated (analogous to the rotation of a d-dimensional vector) with a random angle as the encryption process. Input information is hidden in this target phase of d-ary features for attribute obfuscation. Even if attackers have obtained network parameters and intermediate-layer features, they cannot extract input information without knowing the target phase. Hence, the input privacy can be effectively protected by the RENN. Besides, the output accuracy of RENNs only degrades mildly compared to traditional neural networks, and the computational cost is significantly less than the homomorphic encryption.

* arXiv admin note: text overlap with arXiv:2003.08365

Via

Access Paper or Ask Questions