Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Lin

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition

Apr 29, 2025

Zhengfu He, Junxuan Wang, Rui Lin, Xuyang Ge, Wentao Shu, Qiong Tang, Junping Zhang, Xipeng Qiu

Abstract:We propose Low-Rank Sparse Attention (Lorsa), a sparse replacement model of Transformer attention layers to disentangle original Multi Head Self Attention (MHSA) into individually comprehensible components. Lorsa is designed to address the challenge of attention superposition to understand attention-mediated interaction between features in different token positions. We show that Lorsa heads find cleaner and finer-grained versions of previously discovered MHSA behaviors like induction heads, successor heads and attention sink behavior (i.e., heavily attending to the first token). Lorsa and Sparse Autoencoder (SAE) are both sparse dictionary learning methods applied to different Transformer components, and lead to consistent findings in many ways. For instance, we discover a comprehensive family of arithmetic-specific Lorsa heads, each corresponding to an atomic operation in Llama-3.1-8B. Automated interpretability analysis indicates that Lorsa achieves parity with SAE in interpretability while Lorsa exhibits superior circuit discovery properties, especially for features computed collectively by multiple MHSA heads. We also conduct extensive experiments on architectural design ablation, Lorsa scaling law and error analysis.

Via

Access Paper or Ask Questions

A Unifying Tensor View for Lightweight CNNs

Dec 15, 2023

Jason Chun Lok Li, Rui Lin, Jiajun Zhou, Edmund Yin Mun Lam, Ngai Wong

Abstract:Despite the decomposition of convolutional kernels for lightweight CNNs being well studied, existing works that rely on tensor network diagrams or hyperdimensional abstraction lack geometry intuition. This work devises a new perspective by linking a 3D-reshaped kernel tensor to its various slice-wise and rank-1 decompositions, permitting a straightforward connection between various tensor approximations and efficient CNN modules. Specifically, it is discovered that a pointwise-depthwise-pointwise (PDP) configuration constitutes a viable construct for lightweight CNNs. Moreover, a novel link to the latest ShiftNet is established, inspiring a first-ever shift layer pruning that achieves nearly 50% compression with < 1% drop in accuracy for ShiftResNet.

* 4 pages, 3 figures, accepted in 2023 IEEE 15th International Conference on ASIC (ASICON 2023)

Via

Access Paper or Ask Questions

Lite it fly: An All-Deformable-Butterfly Network

Nov 14, 2023

Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Binxiao Huang, Jie Ran, Ngai Wong

Figure 1 for Lite it fly: An All-Deformable-Butterfly Network

Figure 2 for Lite it fly: An All-Deformable-Butterfly Network

Figure 3 for Lite it fly: An All-Deformable-Butterfly Network

Figure 4 for Lite it fly: An All-Deformable-Butterfly Network

Abstract:Most deep neural networks (DNNs) consist fundamentally of convolutional and/or fully connected layers, wherein the linear transform can be cast as the product between a filter matrix and a data matrix obtained by arranging feature tensors into columns. The lately proposed deformable butterfly (DeBut) decomposes the filter matrix into generalized, butterflylike factors, thus achieving network compression orthogonal to the traditional ways of pruning or low-rank decomposition. This work reveals an intimate link between DeBut and a systematic hierarchy of depthwise and pointwise convolutions, which explains the empirically good performance of DeBut layers. By developing an automated DeBut chain generator, we show for the first time the viability of homogenizing a DNN into all DeBut layers, thus achieving an extreme sparsity and compression. Various examples and hardware benchmarks verify the advantages of All-DeBut networks. In particular, we show it is possible to compress a PointNet to < 5% parameters with < 5% accuracy drop, a record not achievable by other compression schemes.

* 7 pages, 3 figures, accepted as a brief paper in IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

Via

Access Paper or Ask Questions

Cluster-based Method for Eavesdropping Identification and Localization in Optical Links

Sep 25, 2023

Haokun Song, Rui Lin, Andrea Sgambelluri, Filippo Cugini, Yajie Li, Jie Zhang, Paolo Monti

Figure 1 for Cluster-based Method for Eavesdropping Identification and Localization in Optical Links

Figure 2 for Cluster-based Method for Eavesdropping Identification and Localization in Optical Links

Figure 3 for Cluster-based Method for Eavesdropping Identification and Localization in Optical Links

Figure 4 for Cluster-based Method for Eavesdropping Identification and Localization in Optical Links

Abstract:We propose a cluster-based method to detect and locate eavesdropping events in optical line systems characterized by small power losses. Our findings indicate that detecting such subtle losses from eavesdropping can be accomplished solely through optical performance monitoring (OPM) data collected at the receiver. On the other hand, the localization of such events can be effectively achieved by leveraging in-line OPM data.

* 4 pages, 6 figures, Asia Communications and Photonics Conference (ACP) 2023

Via

Access Paper or Ask Questions

A Spectral Perspective towards Understanding and Improving Adversarial Robustness

Jun 25, 2023

Binxiao Huang, Rui Lin, Chaofan Tao, Ngai Wong

Abstract:Deep neural networks (DNNs) are incredibly vulnerable to crafted, imperceptible adversarial perturbations. While adversarial training (AT) has proven to be an effective defense approach, the AT mechanism for robustness improvement is not fully understood. This work investigates AT from a spectral perspective, adding new insights to the design of effective defenses. In particular, we show that AT induces the deep model to focus more on the low-frequency region, which retains the shape-biased representations, to gain robustness. Further, we find that the spectrum of a white-box attack is primarily distributed in regions the model focuses on, and the perturbation attacks the spectral bands where the model is vulnerable. Based on this observation, to train a model tolerant to frequency-varying perturbation, we propose a spectral alignment regularization (SAR) such that the spectral output inferred by an attacked adversarial input stays as close as possible to its natural input counterpart. Experiments demonstrate that SAR and its weight averaging (WA) extension could significantly improve the robust accuracy by 1.14% ~ 3.87% relative to the standard AT, across multiple datasets (CIFAR-10, CIFAR-100 and Tiny ImageNet), and various attacks (PGD, C&W and Autoattack), without any extra data.

Via

Access Paper or Ask Questions

Frequency Regularization for Improving Adversarial Robustness

Dec 24, 2022

Binxiao Huang, Chaofan Tao, Rui Lin, Ngai Wong

Figure 1 for Frequency Regularization for Improving Adversarial Robustness

Figure 2 for Frequency Regularization for Improving Adversarial Robustness

Figure 3 for Frequency Regularization for Improving Adversarial Robustness

Figure 4 for Frequency Regularization for Improving Adversarial Robustness

Abstract:Deep neural networks are incredibly vulnerable to crafted, human-imperceptible adversarial perturbations. Although adversarial training (AT) has proven to be an effective defense approach, we find that the AT-trained models heavily rely on the input low-frequency content for judgment, accounting for the low standard accuracy. To close the large gap between the standard and robust accuracies during AT, we investigate the frequency difference between clean and adversarial inputs, and propose a frequency regularization (FR) to align the output difference in the spectral domain. Besides, we find Stochastic Weight Averaging (SWA), by smoothing the kernels over epochs, further improves the robustness. Among various defense schemes, our method achieves the strongest robustness against attacks by PGD-20, C\&W and Autoattack, on a WideResNet trained on CIFAR-10 without any extra data.

* accepted by AAAI 2023 workshop

Via

Access Paper or Ask Questions

PECAN: A Product-Quantized Content Addressable Memory Network

Aug 13, 2022

Jie Ran, Rui Lin, Jason Chun Lok Li, Jiajun Zhou, Ngai Wong

Figure 1 for PECAN: A Product-Quantized Content Addressable Memory Network

Figure 2 for PECAN: A Product-Quantized Content Addressable Memory Network

Figure 3 for PECAN: A Product-Quantized Content Addressable Memory Network

Figure 4 for PECAN: A Product-Quantized Content Addressable Memory Network

Abstract:A novel deep neural network (DNN) architecture is proposed wherein the filtering and linear transform are realized solely with product quantization (PQ). This results in a natural implementation via content addressable memory (CAM), which transcends regular DNN layer operations and requires only simple table lookup. Two schemes are developed for the end-to-end PQ prototype training, namely, through angle- and distance-based similarities, which differ in their multiplicative and additive natures with different complexity-accuracy tradeoffs. Even more, the distance-based scheme constitutes a truly multiplier-free DNN solution. Experiments confirm the feasibility of such Product-Quantized Content Addressable Memory Network (PECAN), which has strong implication on hardware-efficient deployments especially for in-memory computing.

Via

Access Paper or Ask Questions

Coarse to Fine: Image Restoration Boosted by Multi-Scale Low-Rank Tensor Completion

Mar 29, 2022

Rui Lin, Cong Chen, Ngai Wong

Figure 1 for Coarse to Fine: Image Restoration Boosted by Multi-Scale Low-Rank Tensor Completion

Figure 2 for Coarse to Fine: Image Restoration Boosted by Multi-Scale Low-Rank Tensor Completion

Figure 3 for Coarse to Fine: Image Restoration Boosted by Multi-Scale Low-Rank Tensor Completion

Figure 4 for Coarse to Fine: Image Restoration Boosted by Multi-Scale Low-Rank Tensor Completion

Abstract:Existing low-rank tensor completion (LRTC) approaches aim at restoring a partially observed tensor by imposing a global low-rank constraint on the underlying completed tensor. However, such a global rank assumption suffers the trade-off between restoring the originally details-lacking parts and neglecting the potentially complex objects, making the completion performance unsatisfactory on both sides. To address this problem, we propose a novel and practical strategy for image restoration that restores the partially observed tensor in a coarse-to-fine (C2F) manner, which gets rid of such trade-off by searching proper local ranks for both low- and high-rank parts. Extensive experiments are conducted to demonstrate the superiority of the proposed C2F scheme. The codes are available at: https://github.com/RuiLin0212/C2FLRTC.

Via

Access Paper or Ask Questions

Deformable Butterfly: A Highly Structured and Sparse Linear Transform

Mar 25, 2022

Rui Lin, Jie Ran, King Hung Chiu, Graziano Chesi, Ngai Wong

Figure 1 for Deformable Butterfly: A Highly Structured and Sparse Linear Transform

Figure 2 for Deformable Butterfly: A Highly Structured and Sparse Linear Transform

Figure 3 for Deformable Butterfly: A Highly Structured and Sparse Linear Transform

Figure 4 for Deformable Butterfly: A Highly Structured and Sparse Linear Transform

Abstract:We introduce a new kind of linear transform named Deformable Butterfly (DeBut) that generalizes the conventional butterfly matrices and can be adapted to various input-output dimensions. It inherits the fine-to-coarse-grained learnable hierarchy of traditional butterflies and when deployed to neural networks, the prominent structures and sparsity in a DeBut layer constitutes a new way for network compression. We apply DeBut as a drop-in replacement of standard fully connected and convolutional layers, and demonstrate its superiority in homogenizing a neural network and rendering it favorable properties such as light weight and low inference complexity, without compromising accuracy. The natural complexity-accuracy tradeoff arising from the myriad deformations of a DeBut layer also opens up new rooms for analytical and practical research. The codes and Appendix are publicly available at: https://github.com/ruilin0212/DeBut.

Via

Access Paper or Ask Questions

What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study

Mar 16, 2022

Binxiao Huang, Chaofan Tao, Rui Lin, Ngai Wong

Figure 1 for What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study

Figure 2 for What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study

Figure 3 for What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study

Figure 4 for What Do Adversarially trained Neural Networks Focus: A Fourier Domain-based Study

Abstract:Although many fields have witnessed the superior performance brought about by deep learning, the robustness of neural networks remains an open issue. Specifically, a small adversarial perturbation on the input may cause the model to produce a completely different output. Such poor robustness implies many potential hazards, especially in security-critical applications, e.g., autonomous driving and mobile robotics. This work studies what information the adversarially trained model focuses on. Empirically, we notice that the differences between the clean and adversarial data are mainly distributed in the low-frequency region. We then find that an adversarially-trained model is more robust than its naturally-trained counterpart due to the reason that the former pays more attention to learning the dominant information in low-frequency components. In addition, we consider two common ways to improve model robustness, namely, by data augmentation and by using stronger network architectures, and understand these techniques from a frequency-domain perspective. We are hopeful this work can shed light on the design of more robust neural networks.

Via

Access Paper or Ask Questions