Abstract:The memory challenges associated with training Large Language Models (LLMs) have become a critical concern, particularly when using the Adam optimizer. To address this issue, numerous memory-efficient techniques have been proposed, with GaLore standing out as a notable example designed to reduce the memory footprint of optimizer states. However, these approaches do not alleviate the memory burden imposed by activations, rendering them unsuitable for scenarios involving long context sequences or large mini-batches. Moreover, their convergence properties are still not well-understood in the literature. In this work, we introduce a Randomized Subspace Optimization framework for pre-training and fine-tuning LLMs. Our approach decomposes the high-dimensional training problem into a series of lower-dimensional subproblems. At each iteration, a random subspace is selected, and the parameters within that subspace are optimized. This structured reduction in dimensionality allows our method to simultaneously reduce memory usage for both activations and optimizer states. We establish comprehensive convergence guarantees and derive rates for various scenarios, accommodating different optimization strategies to solve the subproblems. Extensive experiments validate the superior memory and communication efficiency of our method, achieving performance comparable to GaLore and Adam.
Abstract:Parameter-efficient tuning methods such as LoRA could achieve comparable performance to model tuning by tuning a small portion of the parameters. However, substantial computational resources are still required, as this process involves calculating gradients and performing back-propagation throughout the model. Much effort has recently been devoted to utilizing the derivative-free optimization method to eschew the computation of gradients and showcase an augmented level of robustness in few-shot settings. In this paper, we prepend the low-rank modules into each self-attention layer of the model and employ two derivative-free optimization methods to optimize these low-rank modules at each layer alternately. Extensive results on various tasks and language models demonstrate that our proposed method achieves substantial improvement and exhibits clear advantages in memory usage and convergence speed compared to existing gradient-based parameter-efficient tuning and derivative-free optimization methods in few-shot settings.
Abstract:Quantitative susceptibility mapping (QSM) is a post-processing technique for deriving tissue magnetic susceptibility distribution from MRI phase measurements. Deep learning (DL) algorithms hold great potential for solving the ill-posed QSM reconstruction problem. However, a significant challenge facing current DL-QSM approaches is their limited adaptability to magnetic dipole field orientation variations during training and testing. In this work, we propose a novel Orientation-Adaptive Latent Feature Editing (OA-LFE) module to learn the encoding of acquisition orientation vectors and seamlessly integrate them into the latent features of deep networks. Importantly, it can be directly Plug-and-Play (PnP) into various existing DL-QSM architectures, enabling reconstructions of QSM from arbitrary magnetic dipole orientations. Its effectiveness is demonstrated by combining the OA-LFE module into our previously proposed phase-to-susceptibility single-step instant QSM (iQSM) network, which was initially tailored for pure-axial acquisitions. The proposed OA-LFE-empowered iQSM, which we refer to as iQSM+, is trained in a self-supervised manner on a specially-designed simulation brain dataset. Comprehensive experiments are conducted on simulated and in vivo human brain datasets, encompassing subjects ranging from healthy individuals to those with pathological conditions. These experiments involve various MRI platforms (3T and 7T) and aim to compare our proposed iQSM+ against several established QSM reconstruction frameworks, including the original iQSM. The iQSM+ yields QSM images with significantly improved accuracies and mitigates artifacts, surpassing other state-of-the-art DL-QSM algorithms.
Abstract:The data-driven approach of supervised learning methods has limited applicability in solving dipole inversion in Quantitative Susceptibility Mapping (QSM) with varying scan parameters across different objects. To address this generalization issue in supervised QSM methods, we propose a novel training-free model-based unsupervised method called MoDIP (Model-based Deep Image Prior). MoDIP comprises a small, untrained network and a Data Fidelity Optimization (DFO) module. The network converges to an interim state, acting as an implicit prior for image regularization, while the optimization process enforces the physical model of QSM dipole inversion. Experimental results demonstrate MoDIP's excellent generalizability in solving QSM dipole inversion across different scan parameters. It exhibits robustness against pathological brain QSM, achieving over 32% accuracy improvement than supervised deep learning and traditional iterative methods. It is also 33% more computationally efficient and runs 4 times faster than conventional DIP-based approaches, enabling 3D high-resolution image reconstruction in under 4.5 minutes.
Abstract:Extracting and fusing part features have become the key of fined-grained image recognition. Recently, Non-local (NL) module has shown excellent improvement in image recognition. However, it lacks the mechanism to model the interactions between multi-scale part features, which is vital for fine-grained recognition. In this paper, we propose a novel cross-layer non-local (CNL) module to associate multi-scale receptive fields by two operations. First, CNL computes correlations between features of a query layer and all response layers. Second, all response features are weighted according to the correlations and are added to the query features. Due to the interactions of cross-layer features, our model builds spatial dependencies among multi-level layers and learns more discriminative features. In addition, we can reduce the aggregation cost if we set low-dimensional deep layer as query layer. Experiments are conducted to show our model achieves or surpasses state-of-the-art results on three benchmark datasets of fine-grained classification. Our codes can be found at github.com/FouriYe/CNL-ICIP2020.