Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fenglei Fan

Rethink Deep Learning with Invariance in Data Representation

Dec 06, 2024

Shuren Qi, Fei Wang, Tieyong Zeng, Fenglei Fan

Abstract:Integrating invariance into data representations is a principled design in intelligent systems and web applications. Representations play a fundamental role, where systems and applications are both built on meaningful representations of digital inputs (rather than the raw data). In fact, the proper design/learning of such representations relies on priors w.r.t. the task of interest. Here, the concept of symmetry from the Erlangen Program may be the most fruitful prior -- informally, a symmetry of a system is a transformation that leaves a certain property of the system invariant. Symmetry priors are ubiquitous, e.g., translation as a symmetry of the object classification, where object category is invariant under translation. The quest for invariance is as old as pattern recognition and data mining itself. Invariant design has been the cornerstone of various representations in the era before deep learning, such as the SIFT. As we enter the early era of deep learning, the invariance principle is largely ignored and replaced by a data-driven paradigm, such as the CNN. However, this neglect did not last long before they encountered bottlenecks regarding robustness, interpretability, efficiency, and so on. The invariance principle has returned in the era of rethinking deep learning, forming a new field known as Geometric Deep Learning (GDL). In this tutorial, we will give a historical perspective of the invariance in data representations. More importantly, we will identify those research dilemmas, promising works, future directions, and web applications.

* Accepted by WWW 2025 for a tutorial

Via

Access Paper or Ask Questions

Towards Secure Tuning: Mitigating Security Risks Arising from Benign Instruction Fine-Tuning

Oct 06, 2024

Yanrui Du, Sendong Zhao, Jiawei Cao, Ming Ma, Danyang Zhao, Fenglei Fan, Ting Liu, Bing Qin

Abstract:Instruction Fine-Tuning (IFT) has become an essential method for adapting base Large Language Models (LLMs) into variants for professional and private use. However, researchers have raised concerns over a significant decrease in LLMs' security following IFT, even when the IFT process involves entirely benign instructions (termed Benign IFT). Our study represents a pioneering effort to mitigate the security risks arising from Benign IFT. Specifically, we conduct a Module Robustness Analysis, aiming to investigate how LLMs' internal modules contribute to their security. Based on our analysis, we propose a novel IFT strategy, called the Modular Layer-wise Learning Rate (ML-LR) strategy. In our analysis, we implement a simple security feature classifier that serves as a proxy to measure the robustness of modules (e.g. $Q$/$K$/$V$, etc.). Our findings reveal that the module robustness shows clear patterns, varying regularly with the module type and the layer depth. Leveraging these insights, we develop a proxy-guided search algorithm to identify a robust subset of modules, termed Mods$_{Robust}$. During IFT, the ML-LR strategy employs differentiated learning rates for Mods$_{Robust}$ and the rest modules. Our experimental results show that in security assessments, the application of our ML-LR strategy significantly mitigates the rise in harmfulness of LLMs following Benign IFT. Notably, our ML-LR strategy has little impact on the usability or expertise of LLMs following Benign IFT. Furthermore, we have conducted comprehensive analyses to verify the soundness and flexibility of our ML-LR strategy.

Via

Access Paper or Ask Questions

Hyper-Compression: Model Compression via Hyperfunction

Sep 01, 2024

Fenglei Fan, Juntong Fan, Dayang Wang, Jingbo Zhang, Zelin Dong, Shijun Zhang, Ge Wang, Tieyong Zeng

Figure 1 for Hyper-Compression: Model Compression via Hyperfunction

Figure 2 for Hyper-Compression: Model Compression via Hyperfunction

Figure 3 for Hyper-Compression: Model Compression via Hyperfunction

Figure 4 for Hyper-Compression: Model Compression via Hyperfunction

Abstract:The rapid growth of large models' size has far outpaced that of GPU memory. To bridge this gap, inspired by the succinct relationship between genotype and phenotype, we turn the model compression problem into the issue of parameter representation to propose the so-called hyper-compression. The hyper-compression uses a hyperfunction to represent the parameters of the target network, and notably, here the hyperfunction is designed per ergodic theory that relates to a problem: if a low-dimensional dynamic system can fill the high-dimensional space eventually. Empirically, the proposed hyper-compression enjoys the following merits: 1) \textbf{P}referable compression ratio; 2) \textbf{N}o post-hoc retraining; 3) \textbf{A}ffordable inference time; and 4) \textbf{S}hort compression time. It compresses LLaMA2-7B in an hour and achieves close-to-int4-quantization performance, without retraining and with a performance drop of less than 1\%. Our work has the potential to invigorate the field of model compression, towards a harmony between the scaling law and the stagnation of hardware upgradation.

Via

Access Paper or Ask Questions

Grounding and Enhancing Grid-based Models for Neural Fields

Apr 06, 2024

Zelin Zhao, Fenglei Fan, Wenlong Liao, Junchi Yan

Abstract:Many contemporary studies utilize grid-based models for neural field representation, but a systematic analysis of grid-based models is still missing, hindering the improvement of those models. Therefore, this paper introduces a theoretical framework for grid-based models. This framework points out that these models' approximation and generalization behaviors are determined by grid tangent kernels (GTK), which are intrinsic properties of grid-based models. The proposed framework facilitates a consistent and systematic analysis of diverse grid-based models. Furthermore, the introduced framework motivates the development of a novel grid-based model named the Multiplicative Fourier Adaptive Grid (MulFAGrid). The numerical analysis demonstrates that MulFAGrid exhibits a lower generalization bound than its predecessors, indicating its robust generalization performance. Empirical studies reveal that MulFAGrid achieves state-of-the-art performance in various tasks, including 2D image fitting, 3D signed distance field (SDF) reconstruction, and novel view synthesis, demonstrating superior representation ability. The project website is available at https://sites.google.com/view/cvpr24-2034-submission/home.

* Accepted in CVPR24 as an oral presentation. Pre-rebuttal scores: 555. Post-rebuttal scores: 555

Via

Access Paper or Ask Questions

Enhancing the Performance of Neural Networks Through Causal Discovery and Integration of Domain Knowledge

Dec 01, 2023

Xiaoge Zhang, Xiao-Lin Wang, Fenglei Fan, Yiu-Ming Cheung, Indranil Bose

Abstract:In this paper, we develop a generic methodology to encode hierarchical causality structure among observed variables into a neural network in order to improve its predictive performance. The proposed methodology, called causality-informed neural network (CINN), leverages three coherent steps to systematically map the structural causal knowledge into the layer-to-layer design of neural network while strictly preserving the orientation of every causal relationship. In the first step, CINN discovers causal relationships from observational data via directed acyclic graph (DAG) learning, where causal discovery is recast as a continuous optimization problem to avoid the combinatorial nature. In the second step, the discovered hierarchical causality structure among observed variables is systematically encoded into neural network through a dedicated architecture and customized loss function. By categorizing variables in the causal DAG as root, intermediate, and leaf nodes, the hierarchical causal DAG is translated into CINN with a one-to-one correspondence between nodes in the causal DAG and units in the CINN while maintaining the relative order among these nodes. Regarding the loss function, both intermediate and leaf nodes in the DAG graph are treated as target outputs during CINN training so as to drive co-learning of causal relationships among different types of nodes. As multiple loss components emerge in CINN, we leverage the projection of conflicting gradients to mitigate gradient interference among the multiple learning tasks. Computational experiments across a broad spectrum of UCI data sets demonstrate substantial advantages of CINN in predictive performance over other state-of-the-art methods. In addition, an ablation study underscores the value of integrating structural and quantitative causal knowledge in enhancing the neural network's predictive performance incrementally.

Via

Access Paper or Ask Questions

CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Feb 28, 2022

Dayang Wang, Fenglei Fan, Zhan Wu, Rui Liu, Fei Wang, Hengyong Yu

Figure 1 for CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Figure 2 for CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Figure 3 for CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Figure 4 for CTformer: Convolution-free Token2Token Dilated Vision Transformer for Low-dose CT Denoising

Abstract:Low-dose computed tomography (LDCT) denoising is an important problem in CT research. Compared to the normal dose CT (NDCT), LDCT images are subjected to severe noise and artifacts. Recently in many studies, vision transformers have shown superior feature representation ability over convolutional neural networks (CNNs). However, unlike CNNs, the potential of vision transformers in LDCT denoising was little explored so far. To fill this gap, we propose a Convolution-free Token2Token Dilated Vision Transformer for low-dose CT denoising. The CTformer uses a more powerful token rearrangement to encompass local contextual information and thus avoids convolution. It also dilates and shifts feature maps to capture longer-range interaction. We interpret the CTformer by statically inspecting patterns of its internal attention maps and dynamically tracing the hierarchical attention flow with an explanatory graph. Furthermore, an overlapped inference mechanism is introduced to effectively eliminate the boundary artifacts that are common for encoder-decoder-based denoising models. Experimental results on Mayo LDCT dataset suggest that the CTformer outperforms the state-of-the-art denoising methods with a low computation overhead.

* 11 pages, 14 figures

Via

Access Paper or Ask Questions

Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Jul 08, 2020

Chuang Niu, Wenxiang Cong, Fenglei Fan, Hongming Shan, Mengzhou Li, Jimin Liang, Ge Wang

Figure 1 for Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Figure 2 for Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Figure 3 for Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Figure 4 for Low-dimensional Manifold Constrained Disentanglement Network for Metal Artifact Reduction

Abstract:Deep neural network based methods have achieved promising results for CT metal artifact reduction (MAR), most of which use many synthesized paired images for training. As synthesized metal artifacts in CT images may not accurately reflect the clinical counterparts, an artifact disentanglement network (ADN) was proposed with unpaired clinical images directly, producing promising results on clinical datasets. However, without sufficient supervision, it is difficult for ADN to recover structural details of artifact-affected CT images based on adversarial losses only. To overcome these problems, here we propose a low-dimensional manifold (LDM) constrained disentanglement network (DN), leveraging the image characteristics that the patch manifold is generally low-dimensional. Specifically, we design an LDM-DN learning algorithm to empower the disentanglement network through optimizing the synergistic network loss functions while constraining the recovered images to be on a low-dimensional patch manifold. Moreover, learning from both paired and unpaired data, an efficient hybrid optimization scheme is proposed to further improve the MAR performance on clinical datasets. Extensive experiments demonstrate that the proposed LDM-DN approach can consistently improve the MAR performance in paired and/or unpaired learning settings, outperforming competing methods on synthesized and clinical datasets.

Via

Access Paper or Ask Questions

On Interpretability of Artificial Neural Networks

Jan 08, 2020

Fenglei Fan, Jinjun Xiong, Ge Wang

Figure 1 for On Interpretability of Artificial Neural Networks

Figure 2 for On Interpretability of Artificial Neural Networks

Figure 3 for On Interpretability of Artificial Neural Networks

Figure 4 for On Interpretability of Artificial Neural Networks

Abstract:Deep learning has achieved great successes in many important areas to dealing with text, images, video, graphs, and so on. However, the black-box nature of deep artificial neural networks has become the primary obstacle to their public acceptance and wide popularity in critical applications such as diagnosis and therapy. Due to the huge potential of deep learning, interpreting neural networks has become one of the most critical research directions. In this paper, we systematically review recent studies in understanding the mechanism of neural networks and shed light on some future directions of interpretability research (This work is still in progress).

Via

Access Paper or Ask Questions

Quadratic Autoencoder for Low-Dose CT Denoising

Jan 17, 2019

Fenglei Fan, Hongming Shan, Ge Wang

Figure 1 for Quadratic Autoencoder for Low-Dose CT Denoising

Figure 2 for Quadratic Autoencoder for Low-Dose CT Denoising

Figure 3 for Quadratic Autoencoder for Low-Dose CT Denoising

Figure 4 for Quadratic Autoencoder for Low-Dose CT Denoising

Abstract:Recently, deep learning has transformed many fields including medical imaging. Inspired by diversity of biological neurons, our group proposed quadratic neurons in which the inner product in current artificial neurons is replaced with a quadratic operation on inputs, thereby enhancing the capability of an individual neuron. Along this direction, we are motivated to evaluate the power of quadratic neurons in representative network architectures, towards quadratic neuron based deep learning. In this regard, our prior theoretical studies have shown important merits of quadratic neurons and networks. In this paper, we use quadratic neurons to construct an encoder-decoder structure, referred to as the quadratic auto-encoder, and apply it for low-dose CT de-noising. Then, we perform experiments on the Mayo low-dose CT dataset to demonstrate that the quadratic auto-encoder yields a better de-noising performance.

Via

Access Paper or Ask Questions

Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Dec 31, 2018

Fenglei Fan, Mengzhou Li, Yueyang Teng, Ge Wang

Figure 1 for Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Figure 2 for Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Figure 3 for Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Figure 4 for Soft-Autoencoder and Its Wavelet Shrinkage Interpretation

Abstract:Deep learning is a main focus of artificial intelligence and has greatly impacted other fields. However, deep learning is often criticized for its lack of interpretation. As a successful unsupervised model in deep learning, various autoencoders, especially convolutional autoencoders, are very popular and important. Since these autoencoders need improvements and insights, in this paper we shed light on the nonlinearity of a deep convolutional autoencoder in perspective of perfect signal recovery. In particular, we propose a new type of convolutional autoencoders, termed as Soft-Autoencoder (Soft-AE), in which the activations of encoding layers are implemented with adaptable soft-thresholding units while decoding layers are realized with linear units. Consequently, Soft-AE can be naturally interpreted as a learned cascaded wavelet shrinkage system. Our denoising numerical experiments on CIFAR-10, BSD-300 and Mayo Clinical Challenge Dataset demonstrate that Soft-AE gives a competitive performance relative to its counterparts.

Via

Access Paper or Ask Questions