Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang He

Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences

Mar 17, 2025

Kedi Chen, Zhikai Lei, Fan Zhang, Yinqi Zhang, Qin Chen, Jie Zhou, Liang He, Qipeng Guo, Kai Chen, Wei Zhang

Abstract:Large language models make remarkable progress in reasoning capabilities. Existing works focus mainly on deductive reasoning tasks (e.g., code and math), while another type of reasoning mode that better aligns with human learning, inductive reasoning, is not well studied. We attribute the reason to the fact that obtaining high-quality process supervision data is challenging for inductive reasoning. Towards this end, we novelly employ number sequences as the source of inductive reasoning data. We package sequences into algorithmic problems to find the general term of each sequence through a code solution. In this way, we can verify whether the code solution holds for any term in the current sequence, and inject case-based supervision signals by using code unit tests. We build a sequence synthetic data pipeline and form a training dataset CodeSeq. Experimental results show that the models tuned with CodeSeq improve on both code and comprehensive reasoning benchmarks.

Via

Access Paper or Ask Questions

A Survey on the Optimization of Large Language Model-based Agents

Mar 16, 2025

Shangheng Du, Jiabao Zhao, Jinxin Shi, Zhentao Xie, Xin Jiang, Yanhong Bai, Liang He

Abstract:With the rapid development of Large Language Models (LLMs), LLM-based agents have been widely adopted in various fields, becoming essential for autonomous decision-making and interactive tasks. However, current work typically relies on prompt design or fine-tuning strategies applied to vanilla LLMs, which often leads to limited effectiveness or suboptimal performance in complex agent-related environments. Although LLM optimization techniques can improve model performance across many general tasks, they lack specialized optimization towards critical agent functionalities such as long-term planning, dynamic environmental interaction, and complex decision-making. Although numerous recent studies have explored various strategies to optimize LLM-based agents for complex agent tasks, a systematic review summarizing and comparing these methods from a holistic perspective is still lacking. In this survey, we provide a comprehensive review of LLM-based agent optimization approaches, categorizing them into parameter-driven and parameter-free methods. We first focus on parameter-driven optimization, covering fine-tuning-based optimization, reinforcement learning-based optimization, and hybrid strategies, analyzing key aspects such as trajectory data construction, fine-tuning techniques, reward function design, and optimization algorithms. Additionally, we briefly discuss parameter-free strategies that optimize agent behavior through prompt engineering and external knowledge retrieval. Finally, we summarize the datasets and benchmarks used for evaluation and tuning, review key applications of LLM-based agents, and discuss major challenges and promising future directions. Our repository for related references is available at https://github.com/YoungDubbyDu/LLM-Agent-Optimization.

Via

Access Paper or Ask Questions

Easy-Poly: A Easy Polyhedral Framework For 3D Multi-Object Tracking

Feb 26, 2025

Peng Zhang, Xin Li, Xin Lin, Liang He

Abstract:Recent advancements in 3D multi-object tracking (3D MOT) have predominantly relied on tracking-by-detection pipelines. However, these approaches often neglect potential enhancements in 3D detection processes, leading to high false positives (FP), missed detections (FN), and identity switches (IDS), particularly in challenging scenarios such as crowded scenes, small-object configurations, and adverse weather conditions. Furthermore, limitations in data preprocessing, association mechanisms, motion modeling, and life-cycle management hinder overall tracking robustness. To address these issues, we present Easy-Poly, a real-time, filter-based 3D MOT framework for multiple object categories. Our contributions include: (1) An Augmented Proposal Generator utilizing multi-modal data augmentation and refined SpConv operations, significantly improving mAP and NDS on nuScenes; (2) A Dynamic Track-Oriented (DTO) data association algorithm that effectively manages uncertainties and occlusions through optimal assignment and multiple hypothesis handling; (3) A Dynamic Motion Modeling (DMM) incorporating a confidence-weighted Kalman filter and adaptive noise covariances, enhancing MOTA and AMOTA in challenging conditions; and (4) An extended life-cycle management system with adjustive thresholds to reduce ID switches and false terminations. Experimental results show that Easy-Poly outperforms state-of-the-art methods such as Poly-MOT and Fast-Poly, achieving notable gains in mAP (e.g., from 63.30% to 64.96% with LargeKernel3D) and AMOTA (e.g., from 73.1% to 74.5%), while also running in real-time. These findings highlight Easy-Poly's adaptability and robustness in diverse scenarios, making it a compelling choice for autonomous driving and related 3D MOT applications. The source code of this paper will be published upon acceptance.

* 8 pages, 3 figures, 5 tables

Via

Access Paper or Ask Questions

Stretchable Capacitive and Resistive Strain Sensors: Accessible Manufacturing Using Direct Ink Writing

Feb 25, 2025

Lukas Cha, Sonja Groß, Shuai Mao, Tim Braun, Sami Haddadin, Liang He

Abstract:As robotics advances toward integrating soft structures, anthropomorphic shapes, and complex tasks, soft and highly stretchable mechanotransducers are becoming essential. To reliably measure tactile and proprioceptive data while ensuring shape conformability, stretchability, and adaptability, researchers have explored diverse transduction principles alongside scalable and versatile manufacturing techniques. Nonetheless, many current methods for stretchable sensors are designed to produce a single sensor configuration, thereby limiting design flexibility. Here, we present an accessible, flexible, printing-based fabrication approach for customizable, stretchable sensors. Our method employs a custom-built printhead integrated with a commercial 3D printer to enable direct ink writing (DIW) of conductive ink onto cured silicone substrates. A layer-wise fabrication process, facilitated by stackable trays, allows for the deposition of multiple liquid conductive ink layers within a silicone matrix. To demonstrate the method's capacity for high design flexibility, we fabricate and evaluate both capacitive and resistive strain sensor morphologies. Experimental characterization showed that the capacitive strain sensor possesses high linearity (R^2 = 0.99), high sensitivity near the 1.0 theoretical limit (GF = 0.95), minimal hysteresis (DH = 1.36%), and large stretchability (550%), comparable to state-of-the-art stretchable strain sensors reported in the literature.

* 6 pages, 8 figures, conference

Via

Access Paper or Ask Questions

NatureLM: Deciphering the Language of Nature for Scientific Discovery

Feb 11, 2025

Yingce Xia, Peiran Jin, Shufang Xie, Liang He, Chuan Cao, Renqian Luo, Guoqing Liu, Yue Wang, Zequn Liu, Yuan-Jyue Chen(+35 more)

Figure 1 for NatureLM: Deciphering the Language of Nature for Scientific Discovery

Figure 2 for NatureLM: Deciphering the Language of Nature for Scientific Discovery

Figure 3 for NatureLM: Deciphering the Language of Nature for Scientific Discovery

Figure 4 for NatureLM: Deciphering the Language of Nature for Scientific Discovery

Abstract:Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, and RNA. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (briefly, NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) achieving state-of-the-art performance in tasks like SMILES-to-IUPAC translation and retrosynthesis on USPTO-50k. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.

* 81 pages

Via

Access Paper or Ask Questions

LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction

Feb 05, 2025

Ziwei Wang, Jie Zhou, Qin Chen, Min Zhang, Bo Jiang, Aimin Zhou, Qinchun Bai, Liang He

Abstract:The knowledge tracing (KT) problem is an extremely important topic in personalized education, which aims to predict whether students can correctly answer the next question based on their past question-answer records. Prior work on this task mainly focused on learning the sequence of behaviors based on the IDs or textual information. However, these studies usually fail to capture students' sufficient behavioral patterns without reasoning with rich world knowledge about questions. In this paper, we propose a large language models (LLMs)-based framework for KT, named \texttt{\textbf{LLM-KT}}, to integrate the strengths of LLMs and traditional sequence interaction models. For task-level alignment, we design Plug-and-Play instruction to align LLMs with KT, leveraging LLMs' rich knowledge and powerful reasoning capacity. For modality-level alignment, we design the plug-in context and sequence to integrate multiple modalities learned by traditional methods. To capture the long context of history records, we present a plug-in context to flexibly insert the compressed context embedding into LLMs using question-specific and concept-specific tokens. Furthermore, we introduce a plug-in sequence to enhance LLMs with sequence interaction behavior representation learned by traditional sequence models using a sequence adapter. Extensive experiments show that \texttt{\textbf{LLM-KT}} obtains state-of-the-art performance on four typical datasets by comparing it with approximately 20 strong baselines.

Via

Access Paper or Ask Questions

Designing Kresling Origami for Personalised Wrist Orthosis

Jan 30, 2025

Chenying Liu, Shuai Mao, Yixing Lei, Liang He

Abstract:The wrist plays a pivotal role in facilitating motion dexterity and hand functions. Wrist orthoses, from passive braces to active exoskeletons, provide an effective solution for the assistance and rehabilitation of motor abilities. However, the type of motions facilitated by currently available orthoses is limited, with little emphasis on personalised design. To address these gaps, this paper proposes a novel wrist orthosis design inspired by the Kresling origami. The design can be adapted to accommodate various individual shape parameters, which benefits from the topological variations and intrinsic compliance of origami. Heat-sealable fabrics are used to replicate the non-rigid nature of the Kresling origami. The orthosis is capable of six distinct motion modes with a detachable tendon-based actuation system. Experimental characterisation of the workspace has been conducted by activating tendons individually. The maximum bending angle in each direction ranges from 18.81{\deg} to 32.63{\deg}. When tendons are pulled in combination, the maximum bending angles in the dorsal, palmar, radial, and ulnar directions are 31.66{\deg}, 30.38{\deg}, 27.14{\deg}, and 14.92{\deg}, respectively. The capability to generate complex motions such as the dart-throwing motion and circumduction has also been experimentally validated. The work presents a promising foundation for the development of personalised wrist orthoses for training and rehabilitation.

* Accepted for publication in the 2025 IEEE/RAS International Conference on Soft Robotics

Via

Access Paper or Ask Questions

Disentangled Modeling of Preferences and Social Influence for Group Recommendation

Jan 20, 2025

Guangze Ye, Wen Wu, Guoqing Wang, Xi Chen, Hong Zheng, Liang He

Abstract:The group recommendation (GR) aims to suggest items for a group of users in social networks. Existing work typically considers individual preferences as the sole factor in aggregating group preferences. Actually, social influence is also an important factor in modeling users' contributions to the final group decision. However, existing methods either neglect the social influence of individual members or bundle preferences and social influence together as a unified representation. As a result, these models emphasize the preferences of the majority within the group rather than the actual interaction items, which we refer to as the preference bias issue in GR. Moreover, the self-supervised learning (SSL) strategies they designed to address the issue of group data sparsity fail to account for users' contextual social weights when regulating group representations, leading to suboptimal results. To tackle these issues, we propose a novel model based on Disentangled Modeling of Preferences and Social Influence for Group Recommendation (DisRec). Concretely, we first design a user-level disentangling network to disentangle the preferences and social influence of group members with separate embedding propagation schemes based on (hyper)graph convolution networks. We then introduce a socialbased contrastive learning strategy, selectively excluding user nodes based on their social importance to enhance group representations and alleviate the group-level data sparsity issue. The experimental results demonstrate that our model significantly outperforms state-of-the-art methods on two realworld datasets.

* AAAI 2025 Oral

Via

Access Paper or Ask Questions

Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

Jan 18, 2025

Xin Yi, Yue Li, Linlin Wang, Xiaoling Wang, Liang He

Abstract:Ensuring safety alignment has become a critical requirement for large language models (LLMs), particularly given their widespread deployment in real-world applications. However, LLMs remain susceptible to jailbreak attacks, which exploit system vulnerabilities to bypass safety measures and generate harmful outputs. Although numerous defense mechanisms based on adversarial training have been proposed, a persistent challenge lies in the exacerbation of over-refusal behaviors, which compromise the overall utility of the model. To address these challenges, we propose a Latent-space Adversarial Training with Post-aware Calibration (LATPC) framework. During the adversarial training phase, LATPC compares harmful and harmless instructions in the latent space and extracts safety-critical dimensions to construct refusal features attack, precisely simulating agnostic jailbreak attack types requiring adversarial mitigation. At the inference stage, an embedding-level calibration mechanism is employed to alleviate over-refusal behaviors with minimal computational overhead. Experimental results demonstrate that, compared to various defense methods across five types of jailbreak attacks, LATPC framework achieves a superior balance between safety and utility. Moreover, our analysis underscores the effectiveness of extracting safety-critical dimensions from the latent space for constructing robust refusal feature attacks.

* Under Review

Via

Access Paper or Ask Questions

Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation

Jan 03, 2025

Junjie Xu, Xingjiao Wu, Tanren Yao, Zihao Zhang, Jiayang Bei, Wu Wen, Liang He

Figure 1 for Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation

Figure 2 for Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation

Figure 3 for Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation

Figure 4 for Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation

Abstract:Emotional information is essential for enhancing human-computer interaction and deepening image understanding. However, while deep learning has advanced image recognition, the intuitive understanding and precise control of emotional expression in images remain challenging. Similarly, music research largely focuses on theoretical aspects, with limited exploration of its emotional dimensions and their integration with visual arts. To address these gaps, we introduce EmoMV, an emotion-driven music-to-visual manipulation method that manipulates images based on musical emotions. EmoMV combines bottom-up processing of music elements-such as pitch and rhythm-with top-down application of these emotions to visual aspects like color and lighting. We evaluate EmoMV using a multi-scale framework that includes image quality metrics, aesthetic assessments, and EEG measurements to capture real-time emotional responses. Our results demonstrate that EmoMV effectively translates music's emotional content into visually compelling images, advancing multimodal emotional integration and opening new avenues for creative industries and interactive technologies.

Via

Access Paper or Ask Questions