Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junhee Lee

EO-VLM: VLM-Guided Energy Overload Attacks on Vision Models

Apr 11, 2025

Minjae Seo, Myoungsung You, Junhee Lee, Jaehan Kim, Hwanjo Heo, Jintae Oh, Jinwoo Kim

Abstract:Vision models are increasingly deployed in critical applications such as autonomous driving and CCTV monitoring, yet they remain susceptible to resource-consuming attacks. In this paper, we introduce a novel energy-overloading attack that leverages vision language model (VLM) prompts to generate adversarial images targeting vision models. These images, though imperceptible to the human eye, significantly increase GPU energy consumption across various vision models, threatening the availability of these systems. Our framework, EO-VLM (Energy Overload via VLM), is model-agnostic, meaning it is not limited by the architecture or type of the target vision model. By exploiting the lack of safety filters in VLMs like DALL-E 3, we create adversarial noise images without requiring prior knowledge or internal structure of the target vision models. Our experiments demonstrate up to a 50% increase in energy consumption, revealing a critical vulnerability in current vision models.

* Presented as a poster at ACSAC 2024

Via

Access Paper or Ask Questions

VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Aug 07, 2024

Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

Figure 1 for VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Figure 2 for VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Figure 3 for VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Figure 4 for VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

Abstract:Monocular 3D semantic occupancy prediction is becoming important in robot vision due to the compactness of using a single RGB camera. However, existing methods often do not adequately account for camera perspective geometry, resulting in information imbalance along the depth range of the image. To address this issue, we propose a vanishing point (VP) guided monocular 3D semantic occupancy prediction framework named VPOcc. Our framework consists of three novel modules utilizing VP. First, in the VPZoomer module, we initially utilize VP in feature extraction to achieve information balanced feature extraction across the scene by generating a zoom-in image based on VP. Second, we perform perspective geometry-aware feature aggregation by sampling points towards VP using a VP-guided cross-attention (VPCA) module. Finally, we create an information-balanced feature volume by effectively fusing original and zoom-in voxel feature volumes with a balanced feature volume fusion (BVFV) module. Experiments demonstrate that our method achieves state-of-the-art performance for both IoU and mIoU on SemanticKITTI and SSCBench-KITTI360. These results are obtained by effectively addressing the information imbalance in images through the utilization of VP. Our code will be available at www.github.com/anonymous.

Via

Access Paper or Ask Questions