Abstract:The conventional targeted adversarial attacks add a small perturbation to an image to make neural network models estimate the image as a predefined target class, even if it is not the correct target class. Recently, for visual-language models (VLMs), the focus of targeted adversarial attacks is to generate a perturbation that makes VLMs answer intended target text outputs. For example, they aim to make a small perturbation on an image to make VLMs' answers change from "there is an apple" to "there is a baseball." However, answering just intended text outputs is insufficient for tricky questions like "if there is a baseball, tell me what is below it." This is because the target of the adversarial attacks does not consider the overall integrity of the original image, thereby leading to a lack of visual reasoning. In this work, we focus on generating targeted adversarial examples with visual reasoning against VLMs. To this end, we propose 1) a novel adversarial attack procedure -- namely, Replace-then-Perturb and 2) a contrastive learning-based adversarial loss -- namely, Contrastive-Adv. In Replace-then-Perturb, we first leverage a text-guided segmentation model to find the target object in the image. Then, we get rid of the target object and inpaint the empty space with the desired prompt. By doing this, we can generate a target image corresponding to the desired prompt, while maintaining the overall integrity of the original image. Furthermore, in Contrastive-Adv, we design a novel loss function to obtain better adversarial examples. Our extensive benchmark results demonstrate that Replace-then-Perturb and Contrastive-Adv outperform the baseline adversarial attack algorithms. We note that the source code to reproduce the results will be available.
Abstract:This paper investigates the security vulnerabilities of adversarial-example-based image encryption by executing data reconstruction (DR) attacks on encrypted images. A representative image encryption method is the adversarial visual information hiding (AVIH), which uses type-I adversarial example training to protect gallery datasets used in image recognition tasks. In the AVIH method, the type-I adversarial example approach creates images that appear completely different but are still recognized by machines as the original ones. Additionally, the AVIH method can restore encrypted images to their original forms using a predefined private key generative model. For the best security, assigning a unique key to each image is recommended; however, storage limitations may necessitate some images sharing the same key model. This raises a crucial security question for AVIH: How many images can safely share the same key model without being compromised by a DR attack? To address this question, we introduce a dual-strategy DR attack against the AVIH encryption method by incorporating (1) generative-adversarial loss and (2) augmented identity loss, which prevent DR from overfitting -- an issue akin to that in machine learning. Our numerical results validate this approach through image recognition and re-identification benchmarks, demonstrating that our strategy can significantly enhance the quality of reconstructed images, thereby requiring fewer key-sharing encrypted images. Our source code to reproduce our results will be available soon.
Abstract:We address a joint trajectory planning, user association, resource allocation, and power control problem to maximize proportional fairness in the aerial IoT network, considering practical end-to-end quality-of-service (QoS) and communication schedules. Though the problem is rather ancient, apart from the fact that the previous approaches have never considered user- and time-specific QoS, we point out a prevalent mistake in coordinate optimization approaches adopted by the majority of the literature. Coordinate optimization approaches, which repetitively optimize radio resources for a fixed trajectory and vice versa, generally converge to local optima when all variables are differentiable. However, these methods often stagnate at a non-stationary point, significantly degrading the network utility in mixed-integer problems such as joint trajectory and radio resource optimization. We detour this problem by converting the formulated problem into the Markov decision process (MDP). Exploiting the beneficial characteristics of the MDP, we design a non-iterative framework that cooperatively optimizes trajectory and radio resources without initial trajectory choice. The proposed framework can incorporate various trajectory planning algorithms such as the genetic algorithm, tree search, and reinforcement learning. Extensive comparisons with diverse baselines verify that the proposed framework significantly outperforms the state-of-the-art method, nearly achieving the global optimum. Our implementation code is available at https://github.com/hslyu/dbspf.
Abstract:The growth in artificial intelligence (AI) technology has attracted substantial interests in age-of-information (AoI)-aware task offloading of mobile edge computing (MEC)-namely, minimizing service latency. Additionally, the use of MEC systems poses an additional problem arising from limited battery resources of MDs. This paper tackles the pressing challenge of AoI-aware distributed task offloading optimization, where user association (UA), resource allocation (RA), full-task offloading, and battery of mobile devices (MDs) are jointly considered. In existing studies, joint optimization of overall task offloading and UA is seldom considered due to the complexity of combinatorial optimization problems, and in cases where it is considered, linear objective functions such as power consumption are adopted. Revolutionizing the realm of MEC, our objective includes all major components contributing to users' quality of experience, including AoI and energy consumption. To achieve this, we first formulate an NP-hard combinatorial problem, where the objective function comprises three elements: communication latency, computation latency, and battery usage. We derive a closed-form RA solution of the problem; next, we provide a distributed pricing-based UA solution. We simulate the proposed algorithm for various vision and language AI tasks. Our numerical results show that the proposed method Pareto-dominates baseline methods. More specifically, the results demonstrate that the proposed method can outperform baseline methods by 1.62 times smaller AoI with 41.2% less energy consumption.
Abstract:Despite the extensive research on massive MIMO systems for 5G telecommunications and beyond, the reality is that many deployed base stations are equipped with a limited number of antennas rather than supporting massive MIMO configurations. Furthermore, while the cell-less network concept, which eliminates cell boundaries, is under investigation, practical deployments often grapple with significantly limited backhaul connection capacities between base stations. This letter explores techniques to maximize the sum-rate performance within the constraints of these more realistically equipped multicell networks. We propose an innovative approach that dramatically reduces the need for information exchange between base stations to a mere few bits, in stark contrast to conventional methods that require the exchange of hundreds of bits. Our proposed method not only addresses the limitations imposed by current network infrastructure but also showcases significantly improved performance under these constrained conditions.
Abstract:Model inversion (MI) attacks aim to reveal sensitive information in training datasets by solely accessing model weights. Generative MI attacks, a prominent strand in this field, utilize auxiliary datasets to recreate target data attributes, restricting the images to remain photo-realistic, but their success often depends on the similarity between auxiliary and target datasets. If the distributions are dissimilar, existing MI attack attempts frequently fail, yielding unrealistic or target-unrelated results. In response to these challenges, we introduce a groundbreaking approach named Patch-MI, inspired by jigsaw puzzle assembly. To this end, we build upon a new probabilistic interpretation of MI attacks, employing a generative adversarial network (GAN)-like framework with a patch-based discriminator. This approach allows the synthesis of images that are similar to the target dataset distribution, even in cases of dissimilar auxiliary dataset distribution. Moreover, we artfully employ a random transformation block, a sophisticated maneuver that crafts generalized images, thus enhancing the efficacy of the target classifier. Our numerical and graphical findings demonstrate that Patch-MI surpasses existing generative MI methods in terms of accuracy, marking significant advancements while preserving comparable statistical dataset quality. For reproducibility of our results, we make our source code publicly available in https://github.com/jonggyujang0123/Patch-Attack.
Abstract:Influence functions (IFs) elucidate how learning data affects model behavior. However, growing non-convexity and the number of parameters in modern large-scale models lead to imprecise influence approximation and instability in computations. We highly suspect that the first-order approximation in large models causes such fragility, as IFs change all parameters including possibly nuisance parameters that are irrelevant to the examined data. Thus, we attempt to selectively analyze parameters associated with the data. However, simply computing influence from the chosen parameters can be misleading, as it fails to nullify the subliminal impact of unselected parameters. Our approach introduces generalized IFs, precisely estimating target parameters' influence while considering fixed parameters' effects. Unlike the classic IFs, we newly adopt a method to identify pertinent target parameters closely associated with the analyzed data. Furthermore, we tackle computational instability with a robust inverse-Hessian-vector product approximation. Remarkably, the proposed approximation algorithm guarantees convergence regardless of the network configurations. We evaluated our approach on ResNet-18 and VGG-11 for class removal and backdoor model recovery. Modifying just 10\% of the network yields results comparable to the network retrained from scratch. Aligned with our first guess, we also confirm that modifying an excessive number of parameters results in a decline in network utility. We believe our proposal can become a versatile tool for model analysis across various AI domains, appealing to both specialists and general readers. Codes are available at https://github.com/hslyu/GIF.
Abstract:The robot market has been growing significantly and is expected to become 1.5 times larger in 2024 than what it was in 2019. Robots have attracted attention of security companies thanks to their mobility. These days, for security robots, unmanned aerial vehicles (UAVs) have quickly emerged by highlighting their advantage: they can even go to any hazardous place that humans cannot access. For UAVs, Drone has been a representative model and has several merits to consist of various sensors such as high-resolution cameras. Therefore, Drone is the most suitable as a mobile surveillance robot. These attractive advantages such as high-resolution cameras and mobility can be a double-edged sword, i.e., privacy infringement. Surveillance drones take videos with high-resolution to fulfill their role, however, those contain a lot of privacy sensitive information. The indiscriminate shooting is a critical issue for those who are very reluctant to be exposed. To tackle the privacy infringement, this work proposes face-anonymizing drone patrol system. In this system, one person's face in a video is transformed into a different face with facial components maintained. To construct our privacy-preserving system, we have adopted the latest generative adversarial networks frameworks and have some modifications on losses of those frameworks. Our face-anonymzing approach is evaluated with various public face-image and video dataset. Moreover, our system is evaluated with a customized drone consisting of a high-resolution camera, a companion computer, and a drone control computer. Finally, we confirm that our system can protect privacy sensitive information with our face-anonymzing algorithm while preserving the performance of robot perception, i.e., simultaneous localization and mapping.
Abstract:This paper presents an approach for recognizing human activities from extreme low resolution (e.g., 16x12) videos. Extreme low resolution recognition is not only necessary for analyzing actions at a distance but also is crucial for enabling privacy-preserving recognition of human activities. We design a new two-stream multi-Siamese convolutional neural network. The idea is to explicitly capture the inherent property of low resolution (LR) videos that two images originated from the exact same scene often have totally different pixel values depending on their LR transformations. Our approach learns the shared embedding space that maps LR videos with the same content to the same location regardless of their transformations. We experimentally confirm that our approach of jointly learning such transform robust LR video representation and the classifier outperforms the previous state-of-the-art low resolution recognition approaches on two public standard datasets by a meaningful margin.
Abstract:Privacy protection from surreptitious video recordings is an important societal challenge. We desire a computer vision system (e.g., a robot) that can recognize human activities and assist our daily life, yet ensure that it is not recording video that may invade our privacy. This paper presents a fundamental approach to address such contradicting objectives: human activity recognition while only using extreme low-resolution (e.g., 16x12) anonymized videos. We introduce the paradigm of inverse super resolution (ISR), the concept of learning the optimal set of image transformations to generate multiple low-resolution (LR) training videos from a single video. Our ISR learns different types of sub-pixel transformations optimized for the activity classification, allowing the classifier to best take advantage of existing high-resolution videos (e.g., YouTube videos) by creating multiple LR training videos tailored for the problem. We experimentally confirm that the paradigm of inverse super resolution is able to benefit activity recognition from extreme low-resolution videos.