Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sharon Hu

NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Nov 12, 2024

Ruiyang Qin, Pengyu Ren, Zheyu Yan, Liu Liu, Dancheng Liu, Amir Nassereldine, Jinjun Xiong, Kai Ni, Sharon Hu, Yiyu Shi

Figure 1 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 2 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 3 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 4 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Abstract:Large Language Models (LLMs) deployed on edge devices, known as edge LLMs, need to continuously fine-tune their model parameters from user-generated data under limited resource constraints. However, most existing learning methods are not applicable for edge LLMs because of their reliance on high resources and low learning capacity. Prompt tuning (PT) has recently emerged as an effective fine-tuning method for edge LLMs by only modifying a small portion of LLM parameters, but it suffers from user domain shifts, resulting in repetitive training and losing resource efficiency. Conventional techniques to address domain shift issues often involve complex neural networks and sophisticated training, which are incompatible for PT for edge LLMs. Therefore, an open research question is how to address domain shift issues for edge LLMs with limited resources. In this paper, we propose a prompt tuning framework for edge LLMs, exploiting the benefits offered by non-volatile computing-in-memory (NVCiM) architectures. We introduce a novel NVCiM-assisted PT framework, where we narrow down the core operations to matrix-matrix multiplication, which can then be accelerated by performing in-situ computation on NVCiM. To the best of our knowledge, this is the first work employing NVCiM to improve the edge LLM PT performance.

* Accepted by DATE 2025

Via

Access Paper or Ask Questions

Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Mar 13, 2018

Xiaowei Xu, Qing Lu, Yu Hu, Lin Yang, Sharon Hu, Danny Chen, Yiyu Shi

Figure 1 for Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Figure 2 for Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Figure 3 for Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Figure 4 for Quantization of Fully Convolutional Networks for Accurate Biomedical Image Segmentation

Abstract:With pervasive applications of medical imaging in health-care, biomedical image segmentation plays a central role in quantitative analysis, clinical diagno- sis, and medical intervention. Since manual anno- tation su ers limited reproducibility, arduous e orts, and excessive time, automatic segmentation is desired to process increasingly larger scale histopathological data. Recently, deep neural networks (DNNs), par- ticularly fully convolutional networks (FCNs), have been widely applied to biomedical image segmenta- tion, attaining much improved performance. At the same time, quantization of DNNs has become an ac- tive research topic, which aims to represent weights with less memory (precision) to considerably reduce memory and computation requirements of DNNs while maintaining acceptable accuracy. In this paper, we apply quantization techniques to FCNs for accurate biomedical image segmentation. Unlike existing litera- ture on quantization which primarily targets memory and computation complexity reduction, we apply quan- tization as a method to reduce over tting in FCNs for better accuracy. Speci cally, we focus on a state-of- the-art segmentation framework, suggestive annotation [22], which judiciously extracts representative annota- tion samples from the original training dataset, obtain- ing an e ective small-sized balanced training dataset. We develop two new quantization processes for this framework: (1) suggestive annotation with quantiza- tion for highly representative training samples, and (2) network training with quantization for high accuracy. Extensive experiments on the MICCAI Gland dataset show that both quantization processes can improve the segmentation performance, and our proposed method exceeds the current state-of-the-art performance by up to 1%. In addition, our method has a reduction of up to 6.4x on memory usage.

* 9 pages, 11 Figs, 1 Table, Accepted by CVPR

Via

Access Paper or Ask Questions