Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuhui Qu

Bernie

Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection

Jan 15, 2025

Qisen Cheng, Shuhui Qu, Janghwan Lee

Abstract:Unsupervised visual defect detection is critical in industrial applications, requiring a representation space that captures normal data features while detecting deviations. Achieving a balance between expressiveness and compactness is challenging; an overly expressive space risks inefficiency and mode collapse, impairing detection accuracy. We propose a novel approach using an enhanced VQ-VAE framework optimized for unsupervised defect detection. Our model introduces a patch-aware dynamic code assignment scheme, enabling context-sensitive code allocation to optimize spatial representation. This strategy enhances normal-defect distinction and improves detection accuracy during inference. Experiments on MVTecAD, BTAD, and MTSD datasets show our method achieves state-of-the-art performance.

* 7 pages, Accepted to 36th IEEE ICTAI 2024

Via

Access Paper or Ask Questions

Efficient Generation of Molecular Clusters with Dual-Scale Equivariant Flow Matching

Oct 10, 2024

Akshay Subramanian, Shuhui Qu, Cheol Woo Park, Sulin Liu, Janghwan Lee, Rafael Gómez-Bombarelli

Abstract:Amorphous molecular solids offer a promising alternative to inorganic semiconductors, owing to their mechanical flexibility and solution processability. The packing structure of these materials plays a crucial role in determining their electronic and transport properties, which are key to enhancing the efficiency of devices like organic solar cells (OSCs). However, obtaining these optoelectronic properties computationally requires molecular dynamics (MD) simulations to generate a conformational ensemble, a process that can be computationally expensive due to the large system sizes involved. Recent advances have focused on using generative models, particularly flow-based models as Boltzmann generators, to improve the efficiency of MD sampling. In this work, we developed a dual-scale flow matching method that separates training and inference into coarse-grained and all-atom stages and enhances both the accuracy and efficiency of standard flow matching samplers. We demonstrate the effectiveness of this method on a dataset of Y6 molecular clusters obtained through MD simulations, and we benchmark its efficiency and accuracy against single-scale flow matching methods.

Via

Access Paper or Ask Questions

SHAPNN: Shapley Value Regularized Tabular Neural Network

Sep 15, 2023

Qisen Cheng, Shuhui Qu, Janghwan Lee

Abstract:We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Error-aware Quantization through Noise Tempering

Dec 11, 2022

Zheng Wang, Juncheng B Li, Shuhui Qu, Florian Metze, Emma Strubell

Abstract:Quantization has become a predominant approach for model compression, enabling deployment of large models trained on GPUs onto smaller form-factor devices for inference. Quantization-aware training (QAT) optimizes model parameters with respect to the end task while simulating quantization error, leading to better performance than post-training quantization. Approximation of gradients through the non-differentiable quantization operator is typically achieved using the straight-through estimator (STE) or additive noise. However, STE-based methods suffer from instability due to biased gradients, whereas existing noise-based methods cannot reduce the resulting variance. In this work, we incorporate exponentially decaying quantization-error-aware noise together with a learnable scale of task loss gradient to approximate the effect of a quantization operator. We show this method combines gradient scale and quantization noise in a better optimized way, providing finer-grained estimation of gradients at each weight and activation layer's quantizer bin size. Our controlled noise also contains an implicit curvature term that could encourage flatter minima, which we show is indeed the case in our experiments. Experiments training ResNet architectures on the CIFAR-10, CIFAR-100 and ImageNet benchmarks show that our method obtains state-of-the-art top-1 classification accuracy for uniform (non mixed-precision) quantization, out-performing previous methods by 0.5-1.2% absolute.

Via

Access Paper or Ask Questions

SQuAT: Sharpness- and Quantization-Aware Training for BERT

Oct 13, 2022

Zheng Wang, Juncheng B Li, Shuhui Qu, Florian Metze, Emma Strubell

Figure 1 for SQuAT: Sharpness- and Quantization-Aware Training for BERT

Figure 2 for SQuAT: Sharpness- and Quantization-Aware Training for BERT

Figure 3 for SQuAT: Sharpness- and Quantization-Aware Training for BERT

Figure 4 for SQuAT: Sharpness- and Quantization-Aware Training for BERT

Abstract:Quantization is an effective technique to reduce memory footprint, inference latency, and power consumption of deep learning models. However, existing quantization methods suffer from accuracy degradation compared to full-precision (FP) models due to the errors introduced by coarse gradient estimation through non-differentiable quantization layers. The existence of sharp local minima in the loss landscapes of overparameterized models (e.g., Transformers) tends to aggravate such performance penalty in low-bit (2, 4 bits) settings. In this work, we propose sharpness- and quantization-aware training (SQuAT), which would encourage the model to converge to flatter minima while performing quantization-aware training. Our proposed method alternates training between sharpness objective and step-size objective, which could potentially let the model learn the most suitable parameter update magnitude to reach convergence near-flat minima. Extensive experiments show that our method can consistently outperform state-of-the-art quantized BERT models under 2, 3, and 4-bit settings on GLUE benchmarks by 1%, and can sometimes even outperform full precision (32-bit) models. Our experiments on empirical measurement of sharpness also suggest that our method would lead to flatter minima compared to other quantization methods.

Via

Access Paper or Ask Questions

Robustness of Neural Architectures for Audio Event Detection

May 06, 2022

Juncheng B Li, Shuhui Qu, Florian Metze

Figure 1 for Robustness of Neural Architectures for Audio Event Detection

Figure 2 for Robustness of Neural Architectures for Audio Event Detection

Figure 3 for Robustness of Neural Architectures for Audio Event Detection

Figure 4 for Robustness of Neural Architectures for Audio Event Detection

Abstract:Traditionally, in Audio Recognition pipeline, noise is suppressed by the "frontend", relying on preprocessing techniques such as speech enhancement. However, it is not guaranteed that noise will not cascade into downstream pipelines. To understand the actual influence of noise on the entire audio pipeline, in this paper, we directly investigate the impact of noise on a different types of neural models without the preprocessing step. We measure the recognition performances of 4 different neural network models on the task of environment sound classification under the 3 types of noises: \emph{occlusion} (to emulate intermittent noise), \emph{Gaussian} noise (models continuous noise), and \emph{adversarial perturbations} (worst case scenario). Our intuition is that the different ways in which these models process their input (i.e. CNNs have strong locality inductive biases, which Transformers do not have) should lead to observable differences in performance and/ or robustness, an understanding of which will enable further improvements. We perform extensive experiments on AudioSet which is the largest weakly-labeled sound event dataset available. We also seek to explain the behaviors of different models through output distribution change and weight visualization.

Via

Access Paper or Ask Questions

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

Apr 03, 2022

Juncheng B Li, Shuhui Qu, Po-Yao Huang, Florian Metze

Figure 1 for AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

Figure 2 for AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

Figure 3 for AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

Figure 4 for AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification

Abstract:After its sweeping success in vision and language tasks, pure attention-based neural architectures (e.g. DeiT) are emerging to the top of audio tagging (AT) leaderboards, which seemingly obsoletes traditional convolutional neural networks (CNNs), feed-forward networks or recurrent networks. However, taking a closer look, there is great variability in published research, for instance, performances of models initialized with pretrained weights differ drastically from without pretraining, training time for a model varies from hours to weeks, and often, essences are hidden in seemingly trivial details. This urgently calls for a comprehensive study since our 1st comparison is half-decade old. In this work, we perform extensive experiments on AudioSet which is the largest weakly-labeled sound event dataset available, we also did an analysis based on the data quality and efficiency. We compare a few state-of-the-art baselines on the AT task, and study the performance and efficiency of 2 major categories of neural architectures: CNN variants and attention-based variants. We also closely examine their optimization procedures. Our opensourced experimental results provide insights to trade-off between performance, efficiency, optimization process, for both practitioners and researchers. Implementation: https://github.com/lijuncheng16/AudioTaggingDoneRight

Via

Access Paper or Ask Questions

On Adversarial Robustness of Large-scale Audio Visual Learning

Mar 23, 2022

Juncheng B Li, Shuhui Qu, Xinjian Li, Po-Yao, Huang, Florian Metze

Figure 1 for On Adversarial Robustness of Large-scale Audio Visual Learning

Figure 2 for On Adversarial Robustness of Large-scale Audio Visual Learning

Figure 3 for On Adversarial Robustness of Large-scale Audio Visual Learning

Figure 4 for On Adversarial Robustness of Large-scale Audio Visual Learning

Abstract:As audio-visual systems are being deployed for safety-critical tasks such as surveillance and malicious content filtering, their robustness remains an under-studied area. Existing published work on robustness either does not scale to large-scale dataset, or does not deal with multiple modalities. This work aims to study several key questions related to multi-modal learning through the lens of robustness: 1) Are multi-modal models necessarily more robust than uni-modal models? 2) How to efficiently measure the robustness of multi-modal learning? 3) How to fuse different modalities to achieve a more robust multi-modal model? To understand the robustness of the multi-modal model in a large-scale setting, we propose a density-based metric, and a convexity metric to efficiently measure the distribution of each modality in high-dimensional latent space. Our work provides a theoretical intuition together with empirical evidence showing how multi-modal fusion affects adversarial robustness through these metrics. We further devise a mix-up strategy based on our metrics to improve the robustness of the trained model. Our experiments on AudioSet and Kinetics-Sounds verify our hypothesis that multi-modal models are not necessarily more robust than their uni-modal counterparts in the face of adversarial examples. We also observe our mix-up trained method could achieve as much protection as traditional adversarial training, offering a computationally cheap alternative. Implementation: https://github.com/lijuncheng16/AudioSetDoneRight

* 2022 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2022)

Via

Access Paper or Ask Questions

Audio-Visual Event Recognition through the lens of Adversary

Nov 15, 2020

Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze

Figure 1 for Audio-Visual Event Recognition through the lens of Adversary

Figure 2 for Audio-Visual Event Recognition through the lens of Adversary

Figure 3 for Audio-Visual Event Recognition through the lens of Adversary

Figure 4 for Audio-Visual Event Recognition through the lens of Adversary

Abstract:As audio/visual classification models are widely deployed for sensitive tasks like content filtering at scale, it is critical to understand their robustness along with improving the accuracy. This work aims to study several key questions related to multimodal learning through the lens of adversarial noises: 1) The trade-off between early/middle/late fusion affecting its robustness and accuracy 2) How do different frequency/time domain features contribute to the robustness? 3) How do different neural modules contribute to the adversarial noise? In our experiment, we construct adversarial examples to attack state-of-the-art neural models trained on Google AudioSet. We compare how much attack potency in terms of adversarial perturbation of size $\epsilon$ using different $L_p$ norms we would need to "deactivate" the victim model. Using adversarial noise to ablate multimodal models, we are able to provide insights into what is the best potential fusion strategy to balance the model parameters/accuracy and robustness trade-off and distinguish the robust features versus the non-robust features that various neural networks model tend to learn.

* 4 pages

Via

Access Paper or Ask Questions

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

Dec 06, 2019

Juncheng B. Li, Shuhui Qu, Xinjian Li, Joseph Szurley, J. Zico Kolter, Florian Metze

Figure 1 for Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

Figure 2 for Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

Figure 3 for Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

Figure 4 for Adversarial Music: Real World Audio Adversary Against Wake-word Detection System

Abstract:Voice Assistants (VAs) such as Amazon Alexa or Google Assistant rely on wake-word detection to respond to people's commands, which could potentially be vulnerable to audio adversarial examples. In this work, we target our attack on the wake-word detection system, jamming the model with some inconspicuous background music to deactivate the VAs while our audio adversary is present. We implemented an emulated wake-word detection system of Amazon Alexa based on recent publications. We validated our models against the real Alexa in terms of wake-word detection accuracy. Then we computed our audio adversaries with consideration of expectation over transform and we implemented our audio adversary with a differentiable synthesizer. Next, we verified our audio adversaries digitally on hundreds of samples of utterances collected from the real world. Our experiments show that we can effectively reduce the recognition F1 score of our emulated model from 93.4% to 11.0%. Finally, we tested our audio adversary over the air, and verified it works effectively against Alexa, reducing its F1 score from 92.5% to 11.0%.; We also verified that non-adversarial music does not disable Alexa as effectively as our music at the same sound level. To the best of our knowledge, this is the first real-world adversarial attack against a commercial-grade VA wake-word detection system. Our code and demo videos can be accessed at \url{https://www.junchengbillyli.com/AdversarialMusic}

* NIPS2019_9362, pages = {11908--11918}, year = {2019}, publisher = {Curran Associates, Inc.}, url = {http://papers.nips.cc/paper/9362-adversarial-music-real-world-audio-adversary-against-wake-word-detection-system.pdf} }
* 9 pages, In Proceedings of NeurIPS 2019 Conference

Via

Access Paper or Ask Questions