Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Minji Son

SAFIRE: Segment Any Forged Image Region

Dec 11, 2024

Myung-Joon Kwon, Wonjun Lee, Seung-Hun Nam, Minji Son, Changick Kim

Figure 1 for SAFIRE: Segment Any Forged Image Region

Figure 2 for SAFIRE: Segment Any Forged Image Region

Figure 3 for SAFIRE: Segment Any Forged Image Region

Figure 4 for SAFIRE: Segment Any Forged Image Region

Abstract:Most techniques approach the problem of image forgery localization as a binary segmentation task, training neural networks to label original areas as 0 and forged areas as 1. In contrast, we tackle this issue from a more fundamental perspective by partitioning images according to their originating sources. To this end, we propose Segment Any Forged Image Region (SAFIRE), which solves forgery localization using point prompting. Each point on an image is used to segment the source region containing itself. This allows us to partition images into multiple source regions, a capability achieved for the first time. Additionally, rather than memorizing certain forgery traces, SAFIRE naturally focuses on uniform characteristics within each source region. This approach leads to more stable and effective learning, achieving superior performance in both the new task and the traditional binary forgery localization.

* Proceedings of the AAAI Conference on Artificial Intelligence, 2025
* Accepted at AAAI 2025. Code is available at: https://github.com/mjkwon2021/SAFIRE

Via

Access Paper or Ask Questions

Breaking Temporal Consistency: Generating Video Universal Adversarial Perturbations Using Image Models

Nov 17, 2023

Hee-Seon Kim, Minji Son, Minbeom Kim, Myung-Joon Kwon, Changick Kim

Abstract:As video analysis using deep learning models becomes more widespread, the vulnerability of such models to adversarial attacks is becoming a pressing concern. In particular, Universal Adversarial Perturbation (UAP) poses a significant threat, as a single perturbation can mislead deep learning models on entire datasets. We propose a novel video UAP using image data and image model. This enables us to take advantage of the rich image data and image model-based studies available for video applications. However, there is a challenge that image models are limited in their ability to analyze the temporal aspects of videos, which is crucial for a successful video attack. To address this challenge, we introduce the Breaking Temporal Consistency (BTC) method, which is the first attempt to incorporate temporal information into video attacks using image models. We aim to generate adversarial videos that have opposite patterns to the original. Specifically, BTC-UAP minimizes the feature similarity between neighboring frames in videos. Our approach is simple but effective at attacking unseen video models. Additionally, it is applicable to videos of varying lengths and invariant to temporal shifts. Our approach surpasses existing methods in terms of effectiveness on various datasets, including ImageNet, UCF-101, and Kinetics-400.

* ICCV 2023

Via

Access Paper or Ask Questions

Sketch-based Video Object Localization

Apr 02, 2023

Sangmin Woo, So-Yeong Jeon, Jinyoung Park, Minji Son, Sumin Lee, Changick Kim

Figure 1 for Sketch-based Video Object Localization

Figure 2 for Sketch-based Video Object Localization

Figure 3 for Sketch-based Video Object Localization

Figure 4 for Sketch-based Video Object Localization

Abstract:We introduce Sketch-based Video Object Localization (SVOL), a new task aimed at localizing spatio-temporal object boxes in video queried by the input sketch. We first outline the challenges in the SVOL task and build the Sketch-Video Attention Network (SVANet) with the following design principles: (i) to consider temporal information of video and bridge the domain gap between sketch and video; (ii) to accurately identify and localize multiple objects simultaneously; (iii) to handle various styles of sketches; (iv) to be classification-free. In particular, SVANet is equipped with a Cross-modal Transformer that models the interaction between learnable object tokens, query sketch, and video through attention operations, and learns upon a per-frame set macthing strategy that enables frame-wise prediction while utilizing global video context. We evaluate SVANet on a newly curated SVOL dataset. By design, SVANet successfully learns the mapping between the query sketch and video objects, achieving state-of-the-art results on the SVOL benchmark. We further confirm the effectiveness of SVANet via extensive ablation studies and visualizations. Lastly, we demonstrate its zero-shot capability on unseen datasets and novel categories, suggesting its high scalability in real-world applications.

Via

Access Paper or Ask Questions