Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingyi Li

Training-Free Multi-Step Audio Source Separation

May 26, 2025

Yongyi Zang, Jingyi Li, Qiuqiang Kong

Abstract:Audio source separation aims to separate a mixture into target sources. Previous audio source separation systems usually conduct one-step inference, which does not fully explore the separation ability of models. In this work, we reveal that pretrained one-step audio source separation models can be leveraged for multi-step separation without additional training. We propose a simple yet effective inference method that iteratively applies separation by optimally blending the input mixture with the previous step's separation result. At each step, we determine the optimal blending ratio by maximizing a metric. We prove that our method always yield improvement over one-step inference, provide error bounds based on model smoothness and metric robustness, and provide theoretical analysis connecting our method to denoising along linear interpolation paths between noise and clean distributions, a property we link to denoising diffusion bridge models. Our approach effectively delivers improved separation performance as a "free lunch" from existing models. Our empirical results demonstrate that our multi-step separation approach consistently outperforms one-step inference across both speech enhancement and music source separation tasks, and can achieve scaling performance similar to training a larger model, using more data, or in some cases employing a multi-step training objective. These improvements appear not only on the optimization metric during multi-step inference, but also extend to nearly all non-optimized metrics (with one exception). We also discuss limitations of our approach and directions for future research.

Via

Access Paper or Ask Questions

Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks

Apr 24, 2025

Yang Liu, Bingjie Yan, Tianyuan Zou, Jianqing Zhang, Zixuan Gu, Jianbing Ding, Xidong Wang, Jingyi Li, Xiaozhou Ye, Ye Ouyang(+2 more)

Abstract:Large language models (LLMs) have demonstrated remarkable capabilities, but they require vast amounts of data and computational resources. In contrast, smaller models (SMs), while less powerful, can be more efficient and tailored to specific domains. In this position paper, we argue that taking a collaborative approach, where large and small models work synergistically, can accelerate the adaptation of LLMs to private domains and unlock new potential in AI. We explore various strategies for model collaboration and identify potential challenges and opportunities. Building upon this, we advocate for industry-driven research that prioritizes multi-objective benchmarks on real-world private datasets and applications.

Via

Access Paper or Ask Questions

Multi-agent Application System in Office Collaboration Scenarios

Mar 26, 2025

Songtao Sun, Jingyi Li, Yuanfei Dong, Haoguang Liu, Chenxin Xu, Fuyang Li, Qiang Liu

Abstract:This paper introduces a multi-agent application system designed to enhance office collaboration efficiency and work quality. The system integrates artificial intelligence, machine learning, and natural language processing technologies, achieving functionalities such as task allocation, progress monitoring, and information sharing. The agents within the system are capable of providing personalized collaboration support based on team members' needs and incorporate data analysis tools to improve decision-making quality. The paper also proposes an intelligent agent architecture that separates Plan and Solver, and through techniques such as multi-turn query rewriting and business tool retrieval, it enhances the agent's multi-intent and multi-turn dialogue capabilities. Furthermore, the paper details the design of tools and multi-turn dialogue in the context of office collaboration scenarios, and validates the system's effectiveness through experiments and evaluations. Ultimately, the system has demonstrated outstanding performance in real business applications, particularly in query understanding, task planning, and tool calling. Looking forward, the system is expected to play a more significant role in addressing complex interaction issues within dynamic environments and large-scale multi-agent systems.

* Technical report

Via

Access Paper or Ask Questions

Constructing a Norm for Children's Scientific Drawing: Distribution Features Based on Semantic Similarity of Large Language Models

Feb 21, 2025

Yi Zhang, Fan Wei, Jingyi Li, Yan Wang, Yanyan Yu, Jianli Chen, Zipo Cai, Xinyu Liu, Wei Wang, Peng Wang(+1 more)

Abstract:The use of children's drawings to examining their conceptual understanding has been proven to be an effective method, but there are two major problems with previous research: 1. The content of the drawings heavily relies on the task, and the ecological validity of the conclusions is low; 2. The interpretation of drawings relies too much on the subjective feelings of the researchers. To address this issue, this study uses the Large Language Model (LLM) to identify 1420 children's scientific drawings (covering 9 scientific themes/concepts), and uses the word2vec algorithm to calculate their semantic similarity. The study explores whether there are consistent drawing representations for children on the same theme, and attempts to establish a norm for children's scientific drawings, providing a baseline reference for follow-up children's drawing research. The results show that the representation of most drawings has consistency, manifested as most semantic similarity greater than 0.8. At the same time, it was found that the consistency of the representation is independent of the accuracy (of LLM's recognition), indicating the existence of consistency bias. In the subsequent exploration of influencing factors, we used Kendall rank correlation coefficient to investigate the effects of Sample Size, Abstract Degree, and Focus Points on drawings, and used word frequency statistics to explore whether children represented abstract themes/concepts by reproducing what was taught in class.

Via

Access Paper or Ask Questions

Pluto and Charon: A Time and Memory Efficient Collaborative Edge AI Framework for Personal LLMs Fine-Tuning

Aug 20, 2024

Bei Ouyang, Shengyuan Ye, Liekang Zeng, Tianyi Qian, Jingyi Li, Xu Chen

Abstract:Large language models (LLMs) have unlocked a plethora of powerful applications at the network edge, such as intelligent personal assistants. Data privacy and security concerns have prompted a shift towards edge-based fine-tuning of personal LLMs, away from cloud reliance. However, this raises issues of computational intensity and resource scarcity, hindering training efficiency and feasibility. While current studies investigate parameter-efficient fine-tuning (PEFT) techniques to mitigate resource constraints, our analysis indicates that these techniques are not sufficiently resource-efficient for edge devices. To tackle these challenges, we propose Pluto and Charon (PAC), a time and memory efficient collaborative edge AI framework for personal LLMs fine-tuning. PAC breaks the resource wall of personal LLMs fine-tuning with a sophisticated algorithm-system co-design. (1) Algorithmically, PAC implements a personal LLMs fine-tuning technique that is efficient in terms of parameters, time, and memory. It utilizes Parallel Adapters to circumvent the need for a full backward pass through the LLM backbone. Additionally, an activation cache mechanism further streamlining the process by negating the necessity for repeated forward passes across multiple epochs. (2) Systematically, PAC leverages edge devices in close proximity, pooling them as a collective resource for in-situ personal LLMs fine-tuning, utilizing a hybrid data and pipeline parallelism to orchestrate distributed training. The use of the activation cache eliminates the need for forward pass through the LLM backbone,enabling exclusive fine-tuning of the Parallel Adapters using data parallelism. Extensive evaluation based on prototype implementation demonstrates that PAC remarkably outperforms state-of-the-art approaches, achieving up to 8.64x end-to-end speedup and up to 88.16% reduction in memory footprint.

* Accepted by The 53rd International Conference on Parallel Processing (ICPP'24)

Via

Access Paper or Ask Questions

Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Jun 16, 2024

Weiheng Tang, Jingyi Li, Lin Chen, Xu Chen

Figure 1 for Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Figure 2 for Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Figure 3 for Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Figure 4 for Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

Abstract:Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional intermediate layer between the master and workers in the original distributed learning systems, potentially leading to more severe straggler effect. Recently, coding theory-based approaches have been proposed for stragglers mitigation in distributed learning, but the majority focus on the conventional workers-master architecture. In this paper, along a different line, we investigate the problem of mitigating the straggler effect in hierarchical distributed learning systems with an additional layer composed of edge nodes. Technically, we first derive the fundamental trade-off between the computational loads of workers and the stragglers tolerance. Then, we propose a hierarchical gradient coding framework, which provides better stragglers mitigation, to achieve the derived computational trade-off. To further improve the performance of our framework in heterogeneous scenarios, we formulate an optimization problem with the objective of minimizing the expected execution time for each iteration in the learning process. We develop an efficient algorithm to mathematically solve the problem by outputting the optimum strategy. Extensive simulation results demonstrate the superiority of our schemes compared with conventional solutions.

* The paper has been accepted by IEEE Transactions on Communications

Via

Access Paper or Ask Questions

IMFL-AIGC: Incentive Mechanism Design for Federated Learning Empowered by Artificial Intelligence Generated Content

Jun 12, 2024

Guangjing Huang, Qiong Wu, Jingyi Li, Xu Chen

Abstract:Federated learning (FL) has emerged as a promising paradigm that enables clients to collaboratively train a shared global model without uploading their local data. To alleviate the heterogeneous data quality among clients, artificial intelligence-generated content (AIGC) can be leveraged as a novel data synthesis technique for FL model performance enhancement. Due to various costs incurred by AIGC-empowered FL (e.g., costs of local model computation and data synthesis), however, clients are usually reluctant to participate in FL without adequate economic incentives, which leads to an unexplored critical issue for enabling AIGC-empowered FL. To fill this gap, we first devise a data quality assessment method for data samples generated by AIGC and rigorously analyze the convergence performance of FL model trained using a blend of authentic and AI-generated data samples. We then propose a data quality-aware incentive mechanism to encourage clients' participation. In light of information asymmetry incurred by clients' private multi-dimensional attributes, we investigate clients' behavior patterns and derive the server's optimal incentive strategies to minimize server's cost in terms of both model accuracy loss and incentive payments for both complete and incomplete information scenarios. Numerical results demonstrate that our proposed mechanism exhibits highest training accuracy and reduces up to 53.34% of the server's cost with real-world datasets, compared with existing benchmark mechanisms.

* The paper has been accepted by IEEE Transactions on Mobile Computing

Via

Access Paper or Ask Questions

The Multi-fingered Kinematic Model for Dual-arm Manipulation

Jan 14, 2024

Jingyi Li

Abstract:Bimanual manipulation needs robots to be sensitive on the grasp force which is hard to be accurately detected. This paper proposes RL framework for enhancing the grasp quality during the bimanual manipulation. This framework is based on finger configurations and its feedback. After that, the grasp quality is evaluated by the reward mechanism for the hands to determine strategies. There are 2 strategies, simultaneous and interleaved strategies, which will be determined in this framework to manipulate objects. In this paper, the contour and centroid of objects to the robot are unknown. Through the RL framework, robots can perceive hand-object relation and then optimize fingers configurations. The simulations and experiments showed that this framework can improve the success rates and finger motion accuracy.

* arXiv admin note: text overlap with arXiv:2401.06610

Via

Access Paper or Ask Questions

The Hand-object Kinematic Model for Bimanual Manipulation

Jan 12, 2024

Jingyi Li

Abstract:This paper addresses the planar finger kinematics for seeking optimized manipulation strategies. The first step is to model based on geometric features of linear and rotation motion so that the robot can select the fingers configurations. This kinematic model considers the motion between hands and object. Based on 2-finger manipulation cases, this model can output the strategies for bimanual manipulation. For executing strategies, the second step is to seek the appropriate values of finger joints according to the ending orientation of fingers. The simulation shows that the computed solutions can complete the relative rotation and linear motion of unknown objects.

Via

Access Paper or Ask Questions

Roulette: A Semantic Privacy-Preserving Device-Edge Collaborative Inference Framework for Deep Learning Classification Tasks

Sep 06, 2023

Jingyi Li, Guocheng Liao, Lin Chen, Xu Chen

Abstract:Deep learning classifiers are crucial in the age of artificial intelligence. The device-edge-based collaborative inference has been widely adopted as an efficient framework for promoting its applications in IoT and 5G/6G networks. However, it suffers from accuracy degradation under non-i.i.d. data distribution and privacy disclosure. For accuracy degradation, direct use of transfer learning and split learning is high cost and privacy issues remain. For privacy disclosure, cryptography-based approaches lead to a huge overhead. Other lightweight methods assume that the ground truth is non-sensitive and can be exposed. But for many applications, the ground truth is the user's crucial privacy-sensitive information. In this paper, we propose a framework of Roulette, which is a task-oriented semantic privacy-preserving collaborative inference framework for deep learning classifiers. More than input data, we treat the ground truth of the data as private information. We develop a novel paradigm of split learning where the back-end DNN is frozen and the front-end DNN is retrained to be both a feature extractor and an encryptor. Moreover, we provide a differential privacy guarantee and analyze the hardness of ground truth inference attacks. To validate the proposed Roulette, we conduct extensive performance evaluations using realistic datasets, which demonstrate that Roulette can effectively defend against various attacks and meanwhile achieve good model accuracy. In a situation where the non-i.i.d. is very severe, Roulette improves the inference accuracy by 21\% averaged over benchmarks, while making the accuracy of discrimination attacks almost equivalent to random guessing.

Via

Access Paper or Ask Questions