Abstract:Parameter-efficient fine-tuning (PEFT) is an effective method for adapting pre-trained vision models to downstream tasks by tuning a small subset of parameters. Among PEFT methods, sparse tuning achieves superior performance by only adjusting the weights most relevant to downstream tasks, rather than densely tuning the whole weight matrix. However, this performance improvement has been accompanied by increases in memory usage, which stems from two factors, i.e., the storage of the whole weight matrix as learnable parameters in the optimizer and the additional storage of tunable weight indexes. In this paper, we propose a method named SNELL (Sparse tuning with kerNELized LoRA) for sparse tuning with low memory usage. To achieve low memory usage, SNELL decomposes the tunable matrix for sparsification into two learnable low-rank matrices, saving from the costly storage of the whole original matrix. A competition-based sparsification mechanism is further proposed to avoid the storage of tunable weight indexes. To maintain the effectiveness of sparse tuning with low-rank matrices, we extend the low-rank decomposition by applying nonlinear kernel functions to the whole-matrix merging. Consequently, we gain an increase in the rank of the merged matrix, enhancing the ability of SNELL in adapting the pre-trained models to downstream tasks. Extensive experiments on multiple downstream tasks show that SNELL achieves state-of-the-art performance with low memory usage, endowing PEFT with sparse tuning to large-scale models. Codes are available at https://github.com/ssfgunner/SNELL.
Abstract:Message passing plays a vital role in graph neural networks (GNNs) for effective feature learning. However, the over-reliance on input topology diminishes the efficacy of message passing and restricts the ability of GNNs. Despite efforts to mitigate the reliance, existing study encounters message-passing bottlenecks or high computational expense problems, which invokes the demands for flexible message passing with low complexity. In this paper, we propose a novel dynamic message-passing mechanism for GNNs. It projects graph nodes and learnable pseudo nodes into a common space with measurable spatial relations between them. With nodes moving in the space, their evolving relations facilitate flexible pathway construction for a dynamic message-passing process. Associating pseudo nodes to input graphs with their measured relations, graph nodes can communicate with each other intermediately through pseudo nodes under linear complexity. We further develop a GNN model named $\mathtt{\mathbf{N^2}}$ based on our dynamic message-passing mechanism. $\mathtt{\mathbf{N^2}}$ employs a single recurrent layer to recursively generate the displacements of nodes and construct optimal dynamic pathways. Evaluation on eighteen benchmarks demonstrates the superior performance of $\mathtt{\mathbf{N^2}}$ over popular GNNs. $\mathtt{\mathbf{N^2}}$ successfully scales to large-scale benchmarks and requires significantly fewer parameters for graph classification with the shared recurrent layer.
Abstract:Designing effective graph neural networks (GNNs) with message passing has two fundamental challenges, i.e., determining optimal message-passing pathways and designing local aggregators. Previous methods of designing optimal pathways are limited with information loss on the input features. On the other hand, existing local aggregators generally fail to extract multi-scale features and approximate diverse operators under limited parameter scales. In contrast to these methods, Euclidean convolution has been proven as an expressive aggregator, making it a perfect candidate for GNN construction. However, the challenges of generalizing Euclidean convolution to graphs arise from the irregular structure of graphs. To bridge the gap between Euclidean space and graph topology, we propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution. The permutations constrain all nodes in a row regardless of their input order and therefore enable the flexible generalization of Euclidean convolution to graphs. Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning. CoCN follows local feature-learning and global parameter-sharing mechanisms of convolution neural networks. The whole model can be trained end-to-end, with compressed convolution applied to learn individual node features and their corresponding structure features. CoCN can further borrow successful practices from Euclidean convolution, including residual connection and inception mechanism. We validate CoCN on both node-level and graph-level benchmarks. CoCN achieves superior performance over competitive GNN baselines. Codes are available at https://github.com/sunjss/CoCN.