Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adil Karjauv

MoViE: Mobile Diffusion for Video Editing

Dec 09, 2024

Adil Karjauv, Noor Fathima, Ioannis Lelekas, Fatih Porikli, Amir Ghodrati, Amirhossein Habibian

Abstract:Recent progress in diffusion-based video editing has shown remarkable potential for practical applications. However, these methods remain prohibitively expensive and challenging to deploy on mobile devices. In this study, we introduce a series of optimizations that render mobile video editing feasible. Building upon the existing image editing model, we first optimize its architecture and incorporate a lightweight autoencoder. Subsequently, we extend classifier-free guidance distillation to multiple modalities, resulting in a threefold on-device speedup. Finally, we reduce the number of sampling steps to one by introducing a novel adversarial distillation scheme which preserves the controllability of the editing process. Collectively, these optimizations enable video editing at 12 frames per second on mobile devices, while maintaining high quality. Our results are available at https://qualcomm-ai-research.github.io/mobile-video-editing/

* 8 pages

Via

Access Paper or Ask Questions

Object-Centric Diffusion for Efficient Video Editing

Jan 11, 2024

Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

Figure 1 for Object-Centric Diffusion for Efficient Video Editing

Figure 2 for Object-Centric Diffusion for Efficient Video Editing

Figure 3 for Object-Centric Diffusion for Efficient Video Editing

Figure 4 for Object-Centric Diffusion for Efficient Video Editing

Abstract:Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we conduct an analysis of such inefficiencies, and suggest simple yet effective modifications that allow significant speed-ups whilst maintaining quality. Moreover, we introduce Object-Centric Diffusion, coined as OCD, to further reduce latency by allocating computations more towards foreground edited regions that are arguably more important for perceptual quality. We achieve this by two novel proposals: i) Object-Centric Sampling, decoupling the diffusion steps spent on salient regions or background, allocating most of the model capacity to the former, and ii) Object-Centric 3D Token Merging, which reduces cost of cross-frame attention by fusing redundant tokens in unimportant background regions. Both techniques are readily applicable to a given video editing model \textit{without} retraining, and can drastically reduce its memory and computational cost. We evaluate our proposals on inversion-based and control-signal-based editing pipelines, and show a latency reduction up to 10x for a comparable synthesis quality.

Via

Access Paper or Ask Questions

Investigating Top-$k$ White-Box and Transferable Black-box Attack

Mar 30, 2022

Chaoning Zhang, Philipp Benz, Adil Karjauv, Jae Won Cho, Kang Zhang, In So Kweon

Figure 1 for Investigating Top-$k$ White-Box and Transferable Black-box Attack

Figure 2 for Investigating Top-$k$ White-Box and Transferable Black-box Attack

Figure 3 for Investigating Top-$k$ White-Box and Transferable Black-box Attack

Figure 4 for Investigating Top-$k$ White-Box and Transferable Black-box Attack

Abstract:Existing works have identified the limitation of top-$1$ attack success rate (ASR) as a metric to evaluate the attack strength but exclusively investigated it in the white-box setting, while our work extends it to a more practical black-box setting: transferable attack. It is widely reported that stronger I-FGSM transfers worse than simple FGSM, leading to a popular belief that transferability is at odds with the white-box attack strength. Our work challenges this belief with empirical finding that stronger attack actually transfers better for the general top-$k$ ASR indicated by the interest class rank (ICR) after attack. For increasing the attack strength, with an intuitive interpretation of the logit gradient from the geometric perspective, we identify that the weakness of the commonly used losses lie in prioritizing the speed to fool the network instead of maximizing its strength. To this end, we propose a new normalized CE loss that guides the logit to be updated in the direction of implicitly maximizing its rank distance from the ground-truth class. Extensive results in various settings have verified that our proposed new loss is simple yet effective for top-$k$ attack. Code is available at: \url{https://bit.ly/3uCiomP}

* Accepted by CVPR2022

Via

Access Paper or Ask Questions

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Oct 11, 2021

Philipp Benz, Soomin Ham, Chaoning Zhang, Adil Karjauv, In So Kweon

Figure 1 for Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Figure 2 for Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Figure 3 for Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Figure 4 for Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Abstract:Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations.

* Code: https://github.com/phibenz/robustness_comparison_vit_mlp-mixer_cnn

Via

Access Paper or Ask Questions

Universal Adversarial Training with Class-Wise Perturbations

Apr 07, 2021

Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon

Figure 1 for Universal Adversarial Training with Class-Wise Perturbations

Figure 2 for Universal Adversarial Training with Class-Wise Perturbations

Figure 3 for Universal Adversarial Training with Class-Wise Perturbations

Figure 4 for Universal Adversarial Training with Class-Wise Perturbations

Abstract:Despite their overwhelming success on a wide range of applications, convolutional neural networks (CNNs) are widely recognized to be vulnerable to adversarial examples. This intriguing phenomenon led to a competition between adversarial attacks and defense techniques. So far, adversarial training is the most widely used method for defending against adversarial attacks. It has also been extended to defend against universal adversarial perturbations (UAPs). The SOTA universal adversarial training (UAT) method optimizes a single perturbation for all training samples in the mini-batch. In this work, we find that a UAP does not attack all classes equally. Inspired by this observation, we identify it as the source of the model having unbalanced robustness. To this end, we improve the SOTA UAT by proposing to utilize class-wise UAPs during adversarial training. On multiple benchmark datasets, our class-wise UAT leads superior performance for both clean accuracy and adversarial robustness against universal attack.

* Accepted to ICME 2021

Via

Access Paper or Ask Questions

A Survey On Universal Adversarial Attack

Mar 02, 2021

Chaoning Zhang, Philipp Benz, Chenguo Lin, Adil Karjauv, Jing Wu, In So Kweon

Figure 1 for A Survey On Universal Adversarial Attack

Figure 2 for A Survey On Universal Adversarial Attack

Figure 3 for A Survey On Universal Adversarial Attack

Abstract:Deep neural networks (DNNs) have demonstrated remarkable performance for various applications, meanwhile, they are widely known to be vulnerable to the attack of adversarial perturbations. This intriguing phenomenon has attracted significant attention in machine learning and what might be more surprising to the community is the existence of universal adversarial perturbations (UAPs), i.e. a single perturbation to fool the target DNN for most images. The advantage of UAP is that it can be generated beforehand and then be applied on-the-fly during the attack. With the focus on UAP against deep classifiers, this survey summarizes the recent progress on universal adversarial attacks, discussing the challenges from both the attack and defense sides, as well as the reason for the existence of UAP. Additionally, universal attacks in a wide range of applications beyond deep classification are also covered.

Via

Access Paper or Ask Questions

Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

Feb 12, 2021

Chaoning Zhang, Philipp Benz, Adil Karjauv, In So Kweon

Figure 1 for Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

Figure 2 for Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

Figure 3 for Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

Figure 4 for Universal Adversarial Perturbations Through the Lens of Deep Steganography: Towards A Fourier Perspective

Abstract:The booming interest in adversarial attacks stems from a misalignment between human vision and a deep neural network (DNN), i.e. a human imperceptible perturbation fools the DNN. Moreover, a single perturbation, often called universal adversarial perturbation (UAP), can be generated to fool the DNN for most images. A similar misalignment phenomenon has recently also been observed in the deep steganography task, where a decoder network can retrieve a secret image back from a slightly perturbed cover image. We attempt explaining the success of both in a unified manner from the Fourier perspective. We perform task-specific and joint analysis and reveal that (a) frequency is a key factor that influences their performance based on the proposed entropy metric for quantifying the frequency distribution; (b) their success can be attributed to a DNN being highly sensitive to high-frequency content. We also perform feature layer analysis for providing deep insight on model generalization and robustness. Additionally, we propose two new variants of universal perturbations: (1) Universal Secret Adversarial Perturbation (USAP) that simultaneously achieves attack and hiding; (2) high-pass UAP (HP-UAP) that is less visible to the human eye.

* Accepted to AAAI 2021

Via

Access Paper or Ask Questions

Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach

Dec 30, 2020

Chaoning Zhang, Adil Karjauv, Philipp Benz, In So Kweon

Figure 1 for Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach

Figure 2 for Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach

Figure 3 for Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach

Figure 4 for Towards Robust Data Hiding Against (JPEG) Compression: A Pseudo-Differentiable Deep Learning Approach

Abstract:Data hiding is one widely used approach for protecting authentication and ownership. Most multimedia content like images and videos are transmitted or saved in the compressed form. This kind of lossy compression, such as JPEG, can destroy the hidden data, which raises the need of robust data hiding. It is still an open challenge to achieve the goal of data hiding that can be against these compressions. Recently, deep learning has shown large success in data hiding, while non-differentiability of JPEG makes it challenging to train a deep pipeline for improving robustness against lossy compression. The existing SOTA approaches replace the non-differentiable parts with differentiable modules that perform similar operations. Multiple limitations exist: (a) large engineering effort; (b) requiring a white-box knowledge of compression attacks; (c) only works for simple compression like JPEG. In this work, we propose a simple yet effective approach to address all the above limitations at once. Beyond JPEG, our approach has been shown to improve robustness against various image and video lossy compression algorithms.

Via

Access Paper or Ask Questions

Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

Oct 26, 2020

Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon

Figure 1 for Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

Figure 2 for Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

Abstract:Recently, convolutional neural networks (CNNs) have made significant advancement, however, they are widely known to be vulnerable to adversarial attacks. Adversarial training is the most widely used technique for improving adversarial robustness to strong white-box attacks. Prior works have been evaluating and improving the model average robustness without per-class evaluation. The average evaluation alone might provide a false sense of robustness. For example, the attacker can focus on attacking the vulnerable class, which can be dangerous, especially, when the vulnerable class is a critical one, such as "human" in autonomous driving. In this preregistration submission, we propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. Given that the CIFAR10 training dataset has an equal number of samples for each class, interestingly, preliminary results on it with Resnet18 show that there exists inter-class discrepancy for accuracy and robustness on standard models, for instance, "cat" is more vulnerable than other classes. Moreover, adversarial training increases inter-class discrepancy. Our work aims to investigate the following questions: (a) is the phenomenon of inter-class discrepancy universal for other classification benchmark datasets on other seminal model architectures with various optimization hyper-parameters? (b) If so, what can be possible explanations for the inter-class discrepancy? (c) Can the techniques proposed in the long tail classification be readily extended to adversarial training for addressing the inter-class discrepancy?

Via

Access Paper or Ask Questions

Revisiting Batch Normalization for Improving Corruption Robustness

Oct 07, 2020

Philipp Benz, Chaoning Zhang, Adil Karjauv, In So Kweon

Figure 1 for Revisiting Batch Normalization for Improving Corruption Robustness

Figure 2 for Revisiting Batch Normalization for Improving Corruption Robustness

Figure 3 for Revisiting Batch Normalization for Improving Corruption Robustness

Figure 4 for Revisiting Batch Normalization for Improving Corruption Robustness

Abstract:Modern deep neural networks (DNN) have demonstrated remarkable success in image recognition tasks when the test dataset and training dataset are from the same distribution. In practical applications, however, this assumption is often not valid and results in performance drop when there is a domain shift. For example, the performance of DNNs trained on clean images has been shown to decrease when the test images have common corruptions, limiting their use in performance-sensitive applications. In this work, we interpret corruption robustness as a domain shift problem and propose to rectify batch normalization (BN) statistics for improving model robustness. This shift from the clean domain to the corruption domain can be interpreted as a style shift that is represented by the BN statistics. Straightforwardly, adapting BN statistics is beneficial for rectifying this style shift. Specifically, we find that simply estimating and adapting the BN statistics on a few (32 for instance) representation samples, without retraining the model, improves the corruption robustness by a large margin on several benchmark datasets with a wide range of model architectures. For example, on ImageNet-C, statistics adaptation improves the top1 accuracy from 40.2% to 49%. Moreover, we find that this technique can further improve state-of-the-art robust models from 59.0% to 63.5%.

Via

Access Paper or Ask Questions