Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haoran Sun

One-DoF Robotic Design of Overconstrained Limbs with Energy-Efficient, Self-Collision-Free Motion

Sep 26, 2025

Yuping Gu, Bangchao Huang, Haoran Sun, Ronghan Xu, Jiayi Yin, Wei Zhang, Fang Wan, Jia Pan, Chaoyang Song

Abstract:While it is expected to build robotic limbs with multiple degrees of freedom (DoF) inspired by nature, a single DoF design remains fundamental, providing benefits that include, but are not limited to, simplicity, robustness, cost-effectiveness, and efficiency. Mechanisms, especially those with multiple links and revolute joints connected in closed loops, play an enabling factor in introducing motion diversity for 1-DoF systems, which are usually constrained by self-collision during a full-cycle range of motion. This study presents a novel computational approach to designing one-degree-of-freedom (1-DoF) overconstrained robotic limbs for a desired spatial trajectory, while achieving energy-efficient, self-collision-free motion in full-cycle rotations. Firstly, we present the geometric optimization problem of linkage-based robotic limbs in a generalized formulation for self-collision-free design. Next, we formulate the spatial trajectory generation problem with the overconstrained linkages by optimizing the similarity and dynamic-related metrics. We further optimize the geometric shape of the overconstrained linkage to ensure smooth and collision-free motion driven by a single actuator. We validated our proposed method through various experiments, including personalized automata and bio-inspired hexapod robots. The resulting hexapod robot, featuring overconstrained robotic limbs, demonstrated outstanding energy efficiency during forward walking.

* 23 pages, 11 figures, 2 tables. Accepted by Fundamental Research. For Supplementary Videos, see https://bionicdl.ancorasir.com/?p=1668

Via

Access Paper or Ask Questions

Benefits and Pitfalls of Reinforcement Learning for Language Model Planning: A Theoretical Perspective

Sep 26, 2025

Siwei Wang, Yifei Shen, Haoran Sun, Shi Feng, Shang-Hua Teng, Li Dong, Yaru Hao, Wei Chen

Abstract:Recent reinforcement learning (RL) methods have substantially enhanced the planning capabilities of Large Language Models (LLMs), yet the theoretical basis for their effectiveness remains elusive. In this work, we investigate RL's benefits and limitations through a tractable graph-based abstraction, focusing on policy gradient (PG) and Q-learning methods. Our theoretical analyses reveal that supervised fine-tuning (SFT) may introduce co-occurrence-based spurious solutions, whereas RL achieves correct planning primarily through exploration, underscoring exploration's role in enabling better generalization. However, we also show that PG suffers from diversity collapse, where output diversity decreases during training and persists even after perfect accuracy is attained. By contrast, Q-learning provides two key advantages: off-policy learning and diversity preservation at convergence. We further demonstrate that careful reward design is necessary to prevent reward hacking in Q-learning. Finally, applying our framework to the real-world planning benchmark Blocksworld, we confirm that these behaviors manifest in practice.

Via

Access Paper or Ask Questions

An Uncertainty-Driven Adaptive Self-Alignment Framework for Large Language Models

Jul 23, 2025

Haoran Sun, Zekun Zhang, Shaoning Zeng

Abstract:Large Language Models (LLMs) have demonstrated remarkable progress in instruction following and general-purpose reasoning. However, achieving high-quality alignment with human intent and safety norms without human annotations remains a fundamental challenge. In this work, we propose an Uncertainty-Driven Adaptive Self-Alignment (UDASA) framework designed to improve LLM alignment in a fully automated manner. UDASA first generates multiple responses for each input and quantifies output uncertainty across three dimensions: semantics, factuality, and value alignment. Based on these uncertainty scores, the framework constructs preference pairs and categorizes training samples into three stages, conservative, moderate, and exploratory, according to their uncertainty difference. The model is then optimized progressively across these stages. In addition, we conduct a series of preliminary studies to validate the core design assumptions and provide strong empirical motivation for the proposed framework. Experimental results show that UDASA outperforms existing alignment methods across multiple tasks, including harmlessness, helpfulness, truthfulness, and controlled sentiment generation, significantly improving model performance.

Via

Access Paper or Ask Questions

Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

May 21, 2025

Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao(+239 more)

Abstract:As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.

Via

Access Paper or Ask Questions

Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind

Apr 28, 2025

Mouad Abrini, Omri Abend, Dina Acklin, Henny Admoni, Gregor Aichinger, Nitay Alon, Zahra Ashktorab, Ashish Atreja, Moises Auron, Alexander Aufreiter(+98 more)

Abstract:This volume includes a selection of papers presented at the Workshop on Advancing Artificial Intelligence through Theory of Mind held at AAAI 2025 in Philadelphia US on 3rd March 2025. The purpose of this volume is to provide an open access and curated anthology for the ToM and AI research community.

* workshop proceedings

Via

Access Paper or Ask Questions

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Apr 19, 2025

Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen(+127 more)

Figure 1 for NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Figure 2 for NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Figure 3 for NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Figure 4 for NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Abstract:This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.

* Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

Via

Access Paper or Ask Questions

NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results

Apr 14, 2025

Yuqian Fu, Xingyu Qiu, Bin Ren, Yanwei Fu, Radu Timofte, Nicu Sebe, Ming-Hsuan Yang, Luc Van Gool, Kaijin Zhang, Qingpeng Nong(+52 more)

Abstract:Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains. In conjunction with NTIRE 2025, we organized the 1st CD-FSOD Challenge, aiming to advance the performance of current object detectors on entirely novel target domains with only limited labeled data. The challenge attracted 152 registered participants, received submissions from 42 teams, and concluded with 13 teams making valid final submissions. Participants approached the task from diverse perspectives, proposing novel models that achieved new state-of-the-art (SOTA) results under both open-source and closed-source settings. In this report, we present an overview of the 1st NTIRE 2025 CD-FSOD Challenge, highlighting the proposed solutions and summarizing the results submitted by the participants.

* accepted by CVPRW 25 @ NTIRE

Via

Access Paper or Ask Questions

HiSTF Mamba: Hierarchical Spatiotemporal Fusion with Multi-Granular Body-Spatial Modeling for High-Fidelity Text-to-Motion Generation

Mar 10, 2025

Xingzu Zhan, Chen Xie, Haoran Sun, Xiaochun Mai

Abstract:Text-to-motion generation is a rapidly growing field at the nexus of multimodal learning and computer graphics, promising flexible and cost-effective applications in gaming, animation, robotics, and virtual reality. Existing approaches often rely on simple spatiotemporal stacking, which introduces feature redundancy, while subtle joint-level details remain overlooked from a spatial perspective. To this end, we propose a novel HiSTF Mamba framework. The framework is composed of three key modules: Dual-Spatial Mamba, Bi-Temporal Mamba, and Dynamic Spatiotemporal Fusion Module (DSFM). Dual-Spatial Mamba incorporates ``Part-based + Whole-based'' parallel modeling to represent both whole-body coordination and fine-grained joint dynamics. Bi-Temporal Mamba adopts a bidirectional scanning strategy, effectively encoding short-term motion details and long-term dependencies. DSFM further performs redundancy removal and extraction of complementary information for temporal features, then fuses them with spatial features, yielding an expressive spatio-temporal representation. Experimental results on the HumanML3D dataset demonstrate that HiSTF Mamba achieves state-of-the-art performance across multiple metrics. In particular, it reduces the FID score from 0.283 to 0.189, a relative decrease of nearly 30%. These findings validate the effectiveness of HiSTF Mamba in achieving high fidelity and strong semantic alignment in text-to-motion generation.

* 11pages,3figures,

Via

Access Paper or Ask Questions

Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Feb 26, 2025

Zhaowei Zhang, Fengshuo Bai, Qizhi Chen, Chengdong Ma, Mingzhi Wang, Haoran Sun, Zilong Zheng, Yaodong Yang

Figure 1 for Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Figure 2 for Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Figure 3 for Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Figure 4 for Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

Abstract:How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users' personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.

* Accepted by ICLR 2025, Project page: https://zowiezhang.github.io/projects/Amulet

Via

Access Paper or Ask Questions

LCV2I: Communication-Efficient and High-Performance Collaborative Perception Framework with Low-Resolution LiDAR

Feb 24, 2025

Xinxin Feng, Haoran Sun, Haifeng Zheng, Huacong Chen, Wenqiang Chen

Abstract:Vehicle-to-Infrastructure (V2I) collaborative perception leverages data collected by infrastructure's sensors to enhance vehicle perceptual capabilities. LiDAR, as a commonly used sensor in cooperative perception, is widely equipped in intelligent vehicles and infrastructure. However, its superior performance comes with a correspondingly high cost. To achieve low-cost V2I, reducing the cost of LiDAR is crucial. Therefore, we study adopting low-resolution LiDAR on the vehicle to minimize cost as much as possible. However, simply reducing the resolution of vehicle's LiDAR results in sparse point clouds, making distant small objects even more blurred. Additionally, traditional communication methods have relatively low bandwidth utilization efficiency. These factors pose challenges for us. To balance cost and perceptual accuracy, we propose a new collaborative perception framework, namely LCV2I. LCV2I uses data collected from cameras and low-resolution LiDAR as input. It also employs feature offset correction modules and regional feature enhancement algorithms to improve feature representation. Finally, we use regional difference map and regional score map to assess the value of collaboration content, thereby improving communication bandwidth efficiency. In summary, our approach achieves high perceptual performance while substantially reducing the demand for high-resolution sensors on the vehicle. To evaluate this algorithm, we conduct 3D object detection in the real-world scenario of DAIR-V2X, demonstrating that the performance of LCV2I consistently surpasses currently existing algorithms.

Via

Access Paper or Ask Questions