Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Gao

Member, IEEE

AcoustoBots: A swarm of robots for acoustophoretic multimodal interactions

May 12, 2025

Narsimlu Kemsaram, James Hardwick, Jincheng Wang, Bonot Gautam, Ceylan Besevli, Giorgos Christopoulos, Sourabh Dogra, Lei Gao, Akin Delibasi, Diego Martinez Plasencia(+4 more)

Abstract:Acoustophoresis has enabled novel interaction capabilities, such as levitation, volumetric displays, mid-air haptic feedback, and directional sound generation, to open new forms of multimodal interactions. However, its traditional implementation as a singular static unit limits its dynamic range and application versatility. This paper introduces AcoustoBots - a novel convergence of acoustophoresis with a movable and reconfigurable phased array of transducers for enhanced application versatility. We mount a phased array of transducers on a swarm of robots to harness the benefits of multiple mobile acoustophoretic units. This offers a more flexible and interactive platform that enables a swarm of acoustophoretic multimodal interactions. Our novel AcoustoBots design includes a hinge actuation system that controls the orientation of the mounted phased array of transducers to achieve high flexibility in a swarm of acoustophoretic multimodal interactions. In addition, we designed a BeadDispenserBot that can deliver particles to trapping locations, which automates the acoustic levitation interaction. These attributes allow AcoustoBots to independently work for a common cause and interchange between modalities, allowing for novel augmentations (e.g., a swarm of haptics, audio, and levitation) and bilateral interactions with users in an expanded interaction area. We detail our design considerations, challenges, and methodological approach to extend acoustophoretic central control in distributed settings. This work demonstrates a scalable acoustic control framework with two mobile robots, laying the groundwork for future deployment in larger robotic swarms. Finally, we characterize the performance of our AcoustoBots and explore the potential interactive scenarios they can enable.

Via

Access Paper or Ask Questions

DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding

Apr 08, 2025

Hossein Entezari Zarch, Lei Gao, Chaoyi Jiang, Murali Annavaram

Abstract:Speculative Decoding (SD) is a widely used approach to accelerate the inference of large language models (LLMs) without reducing generation quality. It operates by first using a compact model to draft multiple tokens efficiently, followed by parallel verification using the target LLM. This approach leads to faster inference compared to auto-regressive decoding. While there are multiple approaches to create a draft model, one promising approach is to use early-exit methods. These methods draft candidate tokens by using a subset of layers of the primary model and applying the remaining layers for verification, allowing a single model to handle both drafting and verification. While this technique reduces memory usage and computational cost, its performance relies on the choice of the exit layer for drafting and the number of tokens drafted (speculation length) in each SD round. Prior works use hyperparameter exploration to statically select these values. However, our evaluations show that these hyperparameter values are task-specific, and even within a task they are dependent on the current sequence context. We introduce DEL, a plug-and-play method that adaptively selects the exit layer and speculation length during inference. DEL dynamically tracks the token acceptance rate if the tokens are drafted at each layer of an LLM and uses that knowledge to heuristically select the optimal exit layer and speculation length. Our experiments across a broad range of models and downstream tasks show that DEL achieves overall speedups of $2.16\times$$\sim$$2.50\times$ over vanilla auto-regressive decoding and improves upon the state-of-the-art SD methods by up to $0.27\times$.

Via

Access Paper or Ask Questions

Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation

Nov 26, 2024

Chaoyi Jiang, Lei Gao, Hossein Entezari Zarch, Murali Annavaram

Abstract:Inference for Large Language Models (LLMs) is computationally demanding. To reduce the cost of auto-regressive decoding, Key-Value (KV) caching is used to store intermediate activations, enabling GPUs to perform only the incremental computation required for each new token. This approach significantly lowers the computational overhead for token generation. However, the memory required for KV caching grows rapidly, often exceeding the capacity of GPU memory. A cost-effective alternative is to offload KV cache to CPU memory, which alleviates GPU memory pressure but shifts the bottleneck to the limited bandwidth of the PCIe connection between the CPU and GPU. Existing methods attempt to address these issues by overlapping GPU computation with I/O or employing CPU-GPU heterogeneous execution, but they are hindered by excessive data movement and dependence on CPU capabilities. In this paper, we introduce an efficient CPU-GPU I/O-aware LLM inference method that avoids transferring the entire KV cache from CPU to GPU by recomputing partial KV cache from activations while concurrently transferring the remaining KV cache via PCIe bus. This approach overlaps GPU recomputation with data transfer to minimize idle GPU time and maximize inference performance. Our method is fully automated by integrating a profiler module that utilizes input characteristics and system hardware information, a scheduler module to optimize the distribution of computation and communication workloads, and a runtime module to efficiently execute the derived execution plan. Experimental results show that our method achieves up to 35.8% lower latency and 46.2% higher throughput during decoding compared to state-of-the-art approaches.

Via

Access Paper or Ask Questions

Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion: An application in 6R robot trajectory

May 31, 2024

Bao Liu, Tianbao Liu, Lei Gao, Zhongshuo Hu, Fei Ye

Abstract:The advancement of industrialization has fostered innovative swarm intelligence algorithms, with Lion Swarm Optimization (LSO) being notable for its robustness and efficiency. However, multi-objective variants of LSO struggle with poor initialization, local optima entrapment, and slow adaptation to dynamic environments. This study proposes a Dynamic Multi-Objective Lion Swarm Optimization with Multi-strategy Fusion (MF-DMOLSO) to overcome these challenges. MF-DMOLSO includes an initialization unit using chaotic mapping, a position update unit enhancing behavior patterns based on non-domination and diversity, and an external archive update unit. Evaluations on benchmark functions showed MF-DMOLSO outperformed existing algorithms achieving an accuracy that exceeds the comparison algorithm by 90%. Applied to 6R robot trajectory planning, MF-DMOLSO optimized running time and maximum acceleration to 8.3s and 0.3pi rad/s^2, respectively, achieving a set coverage rate of 70.97% compared to 2% by multi-objective particle swarm optimization, thus improving efficiency and reducing mechanical dither.

Via

Access Paper or Ask Questions

Ethos: Rectifying Language Models in Orthogonal Parameter Space

Apr 01, 2024

Lei Gao, Yue Niu, Tingting Tang, Salman Avestimehr, Murali Annavaram

Figure 1 for Ethos: Rectifying Language Models in Orthogonal Parameter Space

Figure 2 for Ethos: Rectifying Language Models in Orthogonal Parameter Space

Figure 3 for Ethos: Rectifying Language Models in Orthogonal Parameter Space

Figure 4 for Ethos: Rectifying Language Models in Orthogonal Parameter Space

Abstract:Language models (LMs) have greatly propelled the research on natural language processing. However, LMs also raise concerns regarding the generation of biased or toxic content and the potential disclosure of private information from the training dataset. In this work, we present a new efficient approach, Ethos, that rectifies LMs to mitigate toxicity and bias in outputs and avoid privacy leakage. Ethos is built on task arithmetic. However, unlike current task arithmetic algorithms, Ethos distinguishes general beneficial and undesired knowledge when reconstructing task vectors. Specifically, Ethos first obtains a set of principal components from the pre-trained models using singular value decomposition. Then, by projecting the task vector onto principal components, Ethos identifies the principal components that encode general or undesired knowledge. Ethos performs negating using the task vector with undesired knowledge only, thereby minimizing collateral damage on general model utility. We demonstrate the efficacy of our approach on three different tasks: debiasing, detoxification, and memorization unlearning. Evaluations show Ethos is more effective in removing undesired knowledge and maintaining the overall model performance compared to current task arithmetic methods.

Via

Access Paper or Ask Questions

Interpretability of Machine Learning: Recent Advances and Future Prospects

Apr 30, 2023

Lei Gao, Ling Guan

Figure 1 for Interpretability of Machine Learning: Recent Advances and Future Prospects

Figure 2 for Interpretability of Machine Learning: Recent Advances and Future Prospects

Figure 3 for Interpretability of Machine Learning: Recent Advances and Future Prospects

Figure 4 for Interpretability of Machine Learning: Recent Advances and Future Prospects

Abstract:The proliferation of machine learning (ML) has drawn unprecedented interest in the study of various multimedia contents such as text, image, audio and video, among others. Consequently, understanding and learning ML-based representations have taken center stage in knowledge discovery in intelligent multimedia research and applications. Nevertheless, the black-box nature of contemporary ML, especially in deep neural networks (DNNs), has posed a primary challenge for ML-based representation learning. To address this black-box problem, the studies on interpretability of ML have attracted tremendous interests in recent years. This paper presents a survey on recent advances and future prospects on interpretability of ML, with several application examples pertinent to multimedia computing, including text-image cross-modal representation learning, face recognition, and the recognition of objects. It is evidently shown that the study of interpretability of ML promises an important research direction, one which is worth further investment in.

* IEEE Multimedia (Accepted)

Via

Access Paper or Ask Questions

TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

Apr 14, 2023

Tuo Zhang, Lei Gao, Sunwoo Lee, Mi Zhang, Salman Avestimehr

Figure 1 for TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

Figure 2 for TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

Figure 3 for TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

Figure 4 for TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training

Abstract:In cross-device Federated Learning (FL) environments, scaling synchronous FL methods is challenging as stragglers hinder the training process. Moreover, the availability of each client to join the training is highly variable over time due to system heterogeneities and intermittent connectivity. Recent asynchronous FL methods (e.g., FedBuff) have been proposed to overcome these issues by allowing slower users to continue their work on local training based on stale models and to contribute to aggregation when ready. However, we show empirically that this method can lead to a substantial drop in training accuracy as well as a slower convergence rate. The primary reason is that fast-speed devices contribute to many more rounds of aggregation while others join more intermittently or not at all, and with stale model updates. To overcome this barrier, we propose TimelyFL, a heterogeneity-aware asynchronous FL framework with adaptive partial training. During the training, TimelyFL adjusts the local training workload based on the real-time resource capabilities of each client, aiming to allow more available clients to join in the global update without staleness. We demonstrate the performance benefits of TimelyFL by conducting extensive experiments on various datasets (e.g., CIFAR-10, Google Speech, and Reddit) and models (e.g., ResNet20, VGG11, and ALBERT). In comparison with the state-of-the-art (i.e., FedBuff), our evaluations reveal that TimelyFL improves participation rate by 21.13%, harvests 1.28x - 2.89x more efficiency on convergence rate, and provides a 6.25% increment on test accuracy.

* CVPR 2023 FedVision Workshop

Via

Access Paper or Ask Questions

Classification of Lung Pathologies in Neonates using Dual Tree Complex Wavelet Transform

Feb 17, 2023

Sagarjit Aujla, Adel Mohamed, Ryan Tan, Randy Tan, Lei Gao, Naimul Khan, Karthikeyan Umapathy

Abstract:Annually 8500 neonatal deaths are reported in the US due to respiratory failure. Recently, Lung Ultrasound (LUS), due to its radiation free nature, portability, and being cheaper is gaining wide acceptability as a diagnostic tool for lung conditions. However, lack of highly trained medical professionals has limited its use especially in remote areas. To address this, an automated screening system that captures characteristics of the LUS patterns can be of significant assistance to clinicians who are not experts in lung ultrasound (LUS) images. In this paper, we propose a feature extraction method designed to quantify the spatially-localized line patterns and texture patterns found in LUS images. Using the dual-tree complex wavelet transform (DTCWT) and four types of common image features we propose a method to classify the LUS images into 6 common neonatal lung conditions. These conditions are normal lung, pneumothorax (PTX), transient tachypnea of the newborn (TTN), respiratory distress syndrome (RDS), chronic lung disease (CLD) and consolidation (CON) that could be pneumonia or atelectasis. The proposed method using DTCWT decomposition extracted global statistical, grey-level co-occurrence matrix (GLCM), grey-level run length matrix (GLRLM) and linear binary pattern (LBP) features to be fed to a linear discriminative analysis (LDA) based classifier. Using 15 best DTCWT features along with 3 clinical features the proposed approach achieved a per-image classification accuracy of 92.78% with a balanced dataset containing 720 images from 24 patients and 74.39% with the larger unbalanced dataset containing 1550 images from 42 patients. Likewise, the proposed method achieved a maximum per-subject classification accuracy of 81.53% with 43 DTCWT features and 3 clinical features using the balanced dataset and 64.97% with 13 DTCWT features and 3 clinical features using the unbalanced dataset.

* Under review

Via

Access Paper or Ask Questions

A GPU-accelerated Algorithm for Distinct Discriminant Canonical Correlation Network

Sep 26, 2022

Kai Liu, Lei Gao, Ling Guan

Figure 1 for A GPU-accelerated Algorithm for Distinct Discriminant Canonical Correlation Network

Figure 2 for A GPU-accelerated Algorithm for Distinct Discriminant Canonical Correlation Network

Figure 3 for A GPU-accelerated Algorithm for Distinct Discriminant Canonical Correlation Network

Figure 4 for A GPU-accelerated Algorithm for Distinct Discriminant Canonical Correlation Network

Abstract:Currently, deep neural networks (DNNs)-based models have drawn enormous attention and have been utilized to different domains widely. However, due to the data-driven nature, the DNN models may generate unsatisfying performance on the small scale data sets. To address this problem, a distinct discriminant canonical correlation network (DDCCANet) is proposed to generate the deep-level feature representation, producing improved performance on image classification. However, the DDCCANet model was originally implemented on a CPU with computing time on par with state-of-the-art DNN models running on GPUs. In this paper, a GPU-based accelerated algorithm is proposed to further optimize the DDCCANet algorithm. As a result, not only is the performance of DDCCANet guaranteed, but also greatly shortens the calculation time, making the model more applicable in real tasks. To demonstrate the effectiveness of the proposed accelerated algorithm, we conduct experiments on three database with different scales. Experimental results validate the superiority of the proposed accelerated algorithm on given examples.

Via

Access Paper or Ask Questions

Vision Transformer with Convolutions Architecture Search

Mar 20, 2022

Haichao Zhang, Kuangrong Hao, Witold Pedrycz, Lei Gao, Xuesong Tang, Bing Wei

Figure 1 for Vision Transformer with Convolutions Architecture Search

Figure 2 for Vision Transformer with Convolutions Architecture Search

Abstract:Transformers exhibit great advantages in handling computer vision tasks. They model image classification tasks by utilizing a multi-head attention mechanism to process a series of patches consisting of split images. However, for complex tasks, Transformer in computer vision not only requires inheriting a bit of dynamic attention and global context, but also needs to introduce features concerning noise reduction, shifting, and scaling invariance of objects. Therefore, here we take a step forward to study the structural characteristics of Transformer and convolution and propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS). The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture while maintaining the benefits of the multi-head attention mechanism. The searched block-based backbone network can extract feature maps at different scales. These features are compatible with a wider range of visual tasks, such as image classification (32 M parameters, 82.0% Top-1 accuracy on ImageNet-1K) and object detection (50.4% mAP on COCO2017). The proposed topology based on the multi-head attention mechanism and CNN adaptively associates relational features of pixels with multi-scale features of objects. It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.

Via

Access Paper or Ask Questions