Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Mar 07, 2024

Kaiwen Cai, Zhekai Duan, Gaowen Liu, Charles Fleming, Chris Xiaoxuan Lu

Figure 1 for Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Figure 2 for Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Figure 3 for Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Figure 4 for Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Share this with someone who'll enjoy it:

Abstract:Recent advancements in Vision-Language (VL) models have sparked interest in their deployment on edge devices, yet challenges in handling diverse visual modalities, manual annotation, and computational constraints remain. We introduce EdgeVL, a novel framework that bridges this gap by seamlessly integrating dual-modality knowledge distillation and quantization-aware contrastive learning. This approach enables the adaptation of large VL models, like CLIP, for efficient use with both RGB and non-RGB images on resource-limited devices without the need for manual annotations. EdgeVL not only transfers visual language alignment capabilities to compact models but also maintains feature quality post-quantization, significantly enhancing open-vocabulary classification performance across various visual modalities. Our work represents the first systematic effort to adapt large VL models for edge deployment, showcasing up to 15.4% accuracy improvements on multiple datasets and up to 93-fold reduction in model size.

* Under review

View paper on

Share this with someone who'll enjoy it:

Title:Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Paper and Code