Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Kong

Shanghai Jiao Tong University, Shanghai, China

V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Nov 01, 2024

Liang Mi, Weijun Wang, Wenming Tu, Qingfeng He, Rui Kong, Xinyu Fang, Yazhu Dong, Yikang Zhang, Yunchun Li, Meng Li(+3 more)

Figure 1 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Figure 2 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Figure 3 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Figure 4 for V-LoRA: An Efficient and Flexible System Boosts Vision Applications with LoRA LMM

Abstract:Large Multimodal Models (LMMs) have shown significant progress in various complex vision tasks with the solid linguistic and reasoning capacity inherited from large language models (LMMs). Low-rank adaptation (LoRA) offers a promising method to integrate external knowledge into LMMs, compensating for their limitations on domain-specific tasks. However, the existing LoRA model serving is excessively computationally expensive and causes extremely high latency. In this paper, we present an end-to-end solution that empowers diverse vision tasks and enriches vision applications with LoRA LMMs. Our system, VaLoRA, enables accurate and efficient vision tasks by 1) an accuracy-aware LoRA adapter generation approach that generates LoRA adapters rich in domain-specific knowledge to meet application-specific accuracy requirements, 2) an adaptive-tiling LoRA adapters batching operator that efficiently computes concurrent heterogeneous LoRA adapters, and 3) a flexible LoRA adapter orchestration mechanism that manages application requests and LoRA adapters to achieve the lowest average response latency. We prototype VaLoRA on five popular vision tasks on three LMMs. Experiment results reveal that VaLoRA improves 24-62% of the accuracy compared to the original LMMs and reduces 20-89% of the latency compared to the state-of-the-art LoRA model serving systems.

Via

Access Paper or Ask Questions

LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design

May 28, 2024

Rui Kong, Qiyang Li, Xinyu Fang, Qingtian Feng, Qingfeng He, Yazhu Dong, Weijun Wang, Yuanchun Li, Linghe Kong, Yunxin Liu

Abstract:Recent literature has found that an effective method to customize or further improve large language models (LLMs) is to add dynamic adapters, such as low-rank adapters (LoRA) with Mixture-of-Experts (MoE) structures. Though such dynamic adapters incur modest computational complexity, they surprisingly lead to huge inference latency overhead, slowing down the decoding speed by 2.5+ times. In this paper, we analyze the fine-grained costs of the dynamic adapters and find that the fragmented CUDA kernel calls are the root cause. Therefore, we propose LoRA-Switch, a system-algorithm co-designed architecture for efficient dynamic adapters. Unlike most existing dynamic structures that adopt layer-wise or block-wise dynamic routing, LoRA-Switch introduces a token-wise routing mechanism. It switches the LoRA adapters and weights for each token and merges them into the backbone for inference. For efficiency, this switching is implemented with an optimized CUDA kernel, which fuses the merging operations for all LoRA adapters at once. Based on experiments with popular open-source LLMs on common benchmarks, our approach has demonstrated similar accuracy improvement as existing dynamic adapters, while reducing the decoding latency by more than 2.4 times.

Via

Access Paper or Ask Questions

Data-Driven Stability Assessment of Power Electronic Converters with Multi-Resolution Dynamic Mode Decomposition

Apr 15, 2024

Rui Kong, Subham Sahoo, Yongjie Liu, Frede Blaabjerg

Abstract:Harmonic instability occurs frequently in the power electronic converter system. This paper leverages multi-resolution dynamic mode decomposition (MR-DMD) as a data-driven diagnostic tool for the system stability of power electronic converters, not requiring complex modeling and detailed control information. By combining dynamic mode decomposition (DMD) with the multi-resolution analysis used in wavelet theory, dynamic modes and eigenvalues can be identified at different decomposition levels and time scales with the MR-DMD algorithm, thereby allowing for handling datasets with transient time behaviors, which is not achievable using conventional DMD. Further, the selection criteria for important parameters in MR-DMD are clearly defined through derivation, elucidating the reason for enabling it to extract eigenvalues within different frequency ranges. Finally, the analysis results are verified using the dataset collected from the experimental platform of a low-frequency oscillation scenario in electrified railways featuring a single-phase converter.

Via

Access Paper or Ask Questions

A Gray-Box Stability Analysis Mechanism for Power Electronic Converters

Apr 15, 2024

Rui Kong, Subham Sahoo, Yubo Song, Frede Blaabjerg

Abstract:This paper proposes a gray-box stability analysis mechanism based on data-driven dynamic mode decomposition (DMD) for commercial grid-tied power electronics converters with limited information on its control parameters and topology. By fusing the underlying physical constraints of the state equations into data snapshots, the system dynamic state matrix and input matrix are simultaneously approximated to identify the dominant system dynamic modes and eigenvalues using the DMD with control (DMDc) algorithm. While retaining the advantages of eliminating the need for intrinsic controller information, the proposed gray-box method establishes higher accuracy and interpretable outcomes over the conventional DMD method. Finally, under experimental conditions of a low-frequency oscillation scenario in electrified railways featuring a single-phase converter, the proposed gray-box DMDc is verified to identify the dominant eigenvalues more accurately.

Via

Access Paper or Ask Questions

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Jan 10, 2024

Yuanchun Li, Hao Wen, Weijun Wang, Xiangyu Li, Yizhen Yuan, Guohong Liu, Jiacheng Liu, Wenxing Xu, Xiang Wang, Yi Sun(+15 more)

Figure 1 for Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Figure 2 for Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Figure 3 for Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Figure 4 for Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Abstract:Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

* https://github.com/MobileLLM/Personal_LLM_Agents_Survey

Via

Access Paper or Ask Questions

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Sep 12, 2023

Chenxiao Gao, Chenyang Wu, Mingjun Cao, Rui Kong, Zongzhang Zhang, Yang Yu

Abstract:Decision Transformer (DT), which employs expressive sequence modeling techniques to perform action generation, has emerged as a promising approach to offline policy optimization. However, DT generates actions conditioned on a desired future return, which is known to bear some weaknesses such as the susceptibility to environmental stochasticity. To overcome DT's weaknesses, we propose to empower DT with dynamic programming. Our method comprises three steps. First, we employ in-sample value iteration to obtain approximated value functions, which involves dynamic programming over the MDP structure. Second, we evaluate action quality in context with estimated advantages. We introduce two types of advantage estimators, IAE and GAE, which are suitable for different tasks. Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages. Finally, during testing, ACT generates actions conditioned on a desired advantage. Our evaluation results validate that, by leveraging the power of dynamic programming, ACT demonstrates effective trajectory stitching and robust action generation in spite of the environmental stochasticity, outperforming baseline methods across various benchmarks. Additionally, we conduct an in-depth analysis of ACT's various design choices through ablation studies.

Via

Access Paper or Ask Questions

Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Aug 29, 2023

Rui Kong, Yuanchun Li, Qingtian Feng, Weijun Wang, Linghe Kong, Yunxin Liu

Figure 1 for Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Figure 2 for Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Figure 3 for Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Figure 4 for Serving MoE Models on Resource-constrained Edge Devices via Dynamic Expert Swapping

Abstract:Mixture of experts (MoE) is a popular technique in deep learning that improves model capacity with conditionally-activated parallel neural network modules (experts). However, serving MoE models in resource-constrained latency-critical edge scenarios is challenging due to the significantly increased model size and complexity. In this paper, we first analyze the behavior pattern of MoE models in continuous inference scenarios, which leads to three key observations about the expert activations, including temporal locality, exchangeability, and skippable computation. Based on these observations, we introduce PC-MoE, an inference framework for resource-constrained continuous MoE model serving. The core of PC-MoE is a new data structure, Parameter Committee, that intelligently maintains a subset of important experts in use to reduce resource consumption. The optimal configuration of Parameter Committee is found offline by a profiling-guided committee planner, and expert swapping and request handling at runtime are managed by an adaptive committee scheduler. To evaluate the effectiveness of PC-MoE, we conduct experiments using state-of-the-art MoE models on common computer vision and natural language processing tasks. The results demonstrate optimal trade-offs between resource consumption and model accuracy achieved by PC-MoE. For instance, on object detection tasks with the Swin-MoE model, our approach can reduce memory usage and latency by 42.34% and 18.63% with only 0.10% accuracy degradation.

Via

Access Paper or Ask Questions

PatchBackdoor: Backdoor Attack against Deep Neural Networks without Model Modification

Aug 22, 2023

Yizhen Yuan, Rui Kong, Shenghao Xie, Yuanchun Li, Yunxin Liu

Abstract:Backdoor attack is a major threat to deep learning systems in safety-critical scenarios, which aims to trigger misbehavior of neural network models under attacker-controlled conditions. However, most backdoor attacks have to modify the neural network models through training with poisoned data and/or direct model editing, which leads to a common but false belief that backdoor attack can be easily avoided by properly protecting the model. In this paper, we show that backdoor attacks can be achieved without any model modification. Instead of injecting backdoor logic into the training data or the model, we propose to place a carefully-designed patch (namely backdoor patch) in front of the camera, which is fed into the model together with the input images. The patch can be trained to behave normally at most of the time, while producing wrong prediction when the input image contains an attacker-controlled trigger object. Our main techniques include an effective training method to generate the backdoor patch and a digital-physical transformation modeling method to enhance the feasibility of the patch in real deployments. Extensive experiments show that PatchBackdoor can be applied to common deep learning models (VGG, MobileNet, ResNet) with an attack success rate of 93% to 99% on classification tasks. Moreover, we implement PatchBackdoor in real-world scenarios and show that the attack is still threatening.

* accepted by ACM MM 2023

Via

Access Paper or Ask Questions