Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wenyuan Zhang

S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models

Apr 14, 2025

Wenyuan Zhang, Shuaiyi Nie, Xinghua Zhang, Zefeng Zhang, Tingwen Liu

Abstract:We introduce S1-Bench, a novel benchmark designed to evaluate Large Reasoning Models' (LRMs) performance on simple tasks that favor intuitive system 1 thinking rather than deliberative system 2 reasoning. While LRMs have achieved significant breakthroughs in complex reasoning tasks through explicit chains of thought, their reliance on deep analytical thinking may limit their system 1 thinking capabilities. Moreover, a lack of benchmark currently exists to evaluate LRMs' performance in tasks that require such capabilities. To fill this gap, S1-Bench presents a set of simple, diverse, and naturally clear questions across multiple domains and languages, specifically designed to assess LRMs' performance in such tasks. Our comprehensive evaluation of 22 LRMs reveals significant lower efficiency tendencies, with outputs averaging 15.5 times longer than those of traditional small LLMs. Additionally, LRMs often identify correct answers early but continue unnecessary deliberation, with some models even producing numerous errors. These findings highlight the rigid reasoning patterns of current LRMs and underscore the substantial development needed to achieve balanced dual-system thinking capabilities that can adapt appropriately to task complexity.

* Work in Progress

Via

Access Paper or Ask Questions

SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instrucion Following Evaluation for Social Agents

Feb 21, 2025

Wenyuan Zhang, Tianyun Liu, Mengxiao Song, Xiaodong Li, Tingwen Liu

Abstract:Despite the abundance of prior social strategies possessed by humans, there remains a paucity of research dedicated to their transfer and integration into social agents. Our proposed SOTOPIA-{\Omega} framework aims to address and bridge this gap, with a particular focus on enhancing the social capabilities of language agents. This framework dynamically injects multi-step reasoning strategies inspired by negotiation theory, along with two simple direct strategies, into expert agents, thereby automating the construction of high-quality social dialogue training corpus. Additionally, we introduce the concept of Social Instruction Following (S-IF) and propose two new S-IF evaluation metrics that are complementary to social capability. We demonstrate that several 7B models trained on high-quality corpus not only significantly surpass the expert agent (GPT-4) in achieving social goals but also enhance S-IF performance. Analysis and variant experiments validate the advantages of dynamic construction, which can especially break the agent's prolonged deadlock.

* 26 pages, 5 figures, 23 tables

Via

Access Paper or Ask Questions

Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Jan 02, 2025

Yulun Wu, Han Huang, Wenyuan Zhang, Chao Deng, Ge Gao, Ming Gu, Yu-Shen Liu

Figure 1 for Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Figure 2 for Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Figure 3 for Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Figure 4 for Sparis: Neural Implicit Surface Reconstruction of Indoor Scenes from Sparse Views

Abstract:In recent years, reconstructing indoor scene geometry from multi-view images has achieved encouraging accomplishments. Current methods incorporate monocular priors into neural implicit surface models to achieve high-quality reconstructions. However, these methods require hundreds of images for scene reconstruction. When only a limited number of views are available as input, the performance of monocular priors deteriorates due to scale ambiguity, leading to the collapse of the reconstructed scene geometry. In this paper, we propose a new method, named Sparis, for indoor surface reconstruction from sparse views. Specifically, we investigate the impact of monocular priors on sparse scene reconstruction, introducing a novel prior based on inter-image matching information. Our prior offers more accurate depth information while ensuring cross-view matching consistency. Additionally, we employ an angular filter strategy and an epipolar matching weight function, aiming to reduce errors due to view matching inaccuracies, thereby refining the inter-image prior for improved reconstruction accuracy. The experiments conducted on widely used benchmarks demonstrate superior performance in sparse-view scene reconstruction.

* Accepted by AAAI 2025. Project page: https://yulunwu0108.github.io/Sparis/

Via

Access Paper or Ask Questions

Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set

Oct 18, 2024

Wenyuan Zhang, Yu-Shen Liu, Zhizhong Han

Abstract:It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction. 3D Gaussian splatting (3DGS) provides a novel perspective for volume rendering, and shows advantages in rendering efficiency and quality. Although 3DGS provides a promising neural rendering option, it is still hard to infer SDFs for surface reconstruction with 3DGS due to the discreteness, the sparseness, and the off-surface drift of 3D Gaussians. To resolve these issues, we propose a method that seamlessly merge 3DGS with the learning of neural SDFs. Our key idea is to more effectively constrain the SDF inference with the multi-view consistency. To this end, we dynamically align 3D Gaussians on the zero-level set of the neural SDF using neural pulling, and then render the aligned 3D Gaussians through the differentiable rasterization. Meanwhile, we update the neural SDF by pulling neighboring space to the pulled 3D Gaussians, which progressively refine the signed distance field near the surface. With both differentiable pulling and splatting, we jointly optimize 3D Gaussians and the neural SDF with both RGB and geometry constraints, which recovers more accurate, smooth, and complete surfaces with more geometry details. Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.

* Accepted by NeurIPS 2024. Project page: https://wen-yuan-zhang.github.io/GS-Pull/

Via

Access Paper or Ask Questions

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Oct 17, 2024

Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

Figure 1 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 2 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 3 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 4 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Abstract:Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Sep 18, 2024

Wenyuan Zhang, Jiawei Sheng, Shuaiyi Nie, Zefeng Zhang, Xinghua Zhang, Yongquan He, Tingwen Liu

Figure 1 for Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Figure 2 for Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Figure 3 for Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Figure 4 for Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing

Abstract:Large language model (LLM) role-playing has gained widespread attention, where the authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs' ability to detect characters' known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose a probing dataset to evaluate LLMs' ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to effectively detect these two types of errors, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S2RD), to further explore the potential for improving error detection capabilities. Experiments show that our method effectively improves the LLMs' ability to detect error character knowledge, but it remains an issue that requires ongoing attention.

* 22 pages, 14 figures

Via

Access Paper or Ask Questions

Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Jul 23, 2024

Wenyuan Zhang, Kanle Shi, Yu-Shen Liu, Zhizhong Han

Figure 1 for Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Figure 2 for Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Figure 3 for Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Figure 4 for Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors

Abstract:Unsigned distance functions (UDFs) have been a vital representation for open surfaces. With different differentiable renderers, current methods are able to train neural networks to infer a UDF by minimizing the rendering errors on the UDF to the multi-view ground truth. However, these differentiable renderers are mainly handcrafted, which makes them either biased on ray-surface intersections, or sensitive to unsigned distance outliers, or not scalable to large scale scenes. To resolve these issues, we present a novel differentiable renderer to infer UDFs more accurately. Instead of using handcrafted equations, our differentiable renderer is a neural network which is pre-trained in a data-driven manner. It learns how to render unsigned distances into depth images, leading to a prior knowledge, dubbed volume rendering priors. To infer a UDF for an unseen scene from multiple RGB images, we generalize the learned volume rendering priors to map inferred unsigned distances in alpha blending for RGB image rendering. Our results show that the learned volume rendering priors are unbiased, robust, scalable, 3D aware, and more importantly, easy to learn. We evaluate our method on both widely used benchmarks and real scenes, and report superior performance over the state-of-the-art methods.

* Accepted by ECCV 2024. Project page: https://wen-yuan-zhang.github.io/VolumeRenderingPriors/

Via

Access Paper or Ask Questions

UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Jun 05, 2024

Yu Zhang, Rui Yu, Zhipeng Yao, Wenyuan Zhang, Jun Wang, Liming Zhang

Figure 1 for UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Figure 2 for UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Figure 3 for UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Abstract:The Mean Square Error (MSE) is commonly utilized to estimate the solution of the optimal value function in the vast majority of offline reinforcement learning (RL) models and has achieved outstanding performance. However, we find that its principle can lead to overestimation phenomenon for the value function. In this paper, we first theoretically analyze overestimation phenomenon led by MSE and provide the theoretical upper bound of the overestimated error. Furthermore, to address it, we propose a novel Bellman underestimated operator to counteract overestimation phenomenon and then prove its contraction characteristics. At last, we propose the offline RL algorithm based on underestimated operator and diffusion policy model. Extensive experimental results on D4RL tasks show that our method can outperform state-of-the-art offline RL algorithms, which demonstrates that our theoretical analysis and underestimation way are effective for offline RL tasks.

Via

Access Paper or Ask Questions

Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

Jun 04, 2024

Zefeng Zhang, Jiawei Sheng, Chuang Zhang, Yunzhi Liang, Wenyuan Zhang, Siqi Wang, Tingwen Liu

Figure 1 for Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

Figure 2 for Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

Figure 3 for Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

Figure 4 for Optimal Transport Guided Correlation Assignment for Multimodal Entity Linking

Abstract:Multimodal Entity Linking (MEL) aims to link ambiguous mentions in multimodal contexts to entities in a multimodal knowledge graph. A pivotal challenge is to fully leverage multi-element correlations between mentions and entities to bridge modality gap and enable fine-grained semantic matching. Existing methods attempt several local correlative mechanisms, relying heavily on the automatically learned attention weights, which may over-concentrate on partial correlations. To mitigate this issue, we formulate the correlation assignment problem as an optimal transport (OT) problem, and propose a novel MEL framework, namely OT-MEL, with OT-guided correlation assignment. Thereby, we exploit the correlation between multimodal features to enhance multimodal fusion, and the correlation between mentions and entities to enhance fine-grained matching. To accelerate model prediction, we further leverage knowledge distillation to transfer OT assignment knowledge to attention mechanism. Experimental results show that our model significantly outperforms previous state-of-the-art baselines and confirm the effectiveness of the OT-guided correlation assignment.

* Findings of ACL 2024

Via

Access Paper or Ask Questions

SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Apr 10, 2024

Diankun Zhang, Guoan Wang, Runwen Zhu, Jianbo Zhao, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang(+8 more)

Figure 1 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Figure 2 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Figure 3 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Figure 4 for SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

Abstract:End-to-End paradigms use a unified framework to implement multi-tasks in an autonomous driving system. Despite simplicity and clarity, the performance of end-to-end autonomous driving methods on sub-tasks is still far behind the single-task methods. Meanwhile, the widely used dense BEV features in previous end-to-end methods make it costly to extend to more modalities or tasks. In this paper, we propose a Sparse query-centric paradigm for end-to-end Autonomous Driving (SparseAD), where the sparse queries completely represent the whole driving scenario across space, time and tasks without any dense BEV representation. Concretely, we design a unified sparse architecture for perception tasks including detection, tracking, and online mapping. Moreover, we revisit motion prediction and planning, and devise a more justifiable motion planner framework. On the challenging nuScenes dataset, SparseAD achieves SOTA full-task performance among end-to-end methods and significantly narrows the performance gap between end-to-end paradigms and single-task methods. Codes will be released soon.

Via

Access Paper or Ask Questions