Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junlong Jia

On the Nonlinearity of Layer Normalization

Jun 03, 2024

Yunhao Ni, Yuxin Guo, Junlong Jia, Lei Huang

Figure 1 for On the Nonlinearity of Layer Normalization

Figure 2 for On the Nonlinearity of Layer Normalization

Figure 3 for On the Nonlinearity of Layer Normalization

Figure 4 for On the Nonlinearity of Layer Normalization

Abstract:Layer normalization (LN) is a ubiquitous technique in deep learning but our theoretical understanding to it remains elusive. This paper investigates a new theoretical direction for LN, regarding to its nonlinearity and representation capacity. We investigate the representation capacity of a network with layerwise composition of linear and LN transformations, referred to as LN-Net. We theoretically show that, given $m$ samples with any label assignment, an LN-Net with only 3 neurons in each layer and $O(m)$ LN layers can correctly classify them. We further show the lower bound of the VC dimension of an LN-Net. The nonlinearity of LN can be amplified by group partition, which is also theoretically demonstrated with mild assumption and empirically supported by our experiments. Based on our analyses, we consider to design neural architecture by exploiting and amplifying the nonlinearity of LN, and the effectiveness is supported by our experiments.

* 42 pages, accepted to ICML 2024

Via

Access Paper or Ask Questions

TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

May 20, 2024

Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang(+1 more)

Figure 1 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Figure 2 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Figure 3 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Abstract:We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and training small-scale LMMs with affordable computational resources.

* Our codebase is made public at https://github.com/TinyLLaVA/TinyLLaVA_Factory with documentation available at https://tinyllava-factory.readthedocs.io/en/latest/

Via

Access Paper or Ask Questions

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Feb 22, 2024

Baichuan Zhou, Ying Hu, Xi Weng, Junlong Jia, Jie Luo, Xien Liu, Ji Wu, Lei Huang

Abstract:We present the TinyLLaVA framework that provides a unified perspective in designing and analyzing the small-scale Large Multimodal Models (LMMs). We empirically study the effects of different vision encoders, connection modules, language models, training data and training recipes. Our extensive experiments showed that better quality of data combined with better training recipes, smaller LMMs can consistently achieve on-par performances compared to bigger LMMs. Under our framework, we train a family of small-scale LMMs. Our best model, TinyLLaVA-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL. We hope our findings can serve as baselines for future research in terms of data scaling, training setups and model selections. Our model weights and codes will be made public.

* Our model weights and codes will be made public at https://github.com/DLCV-BUAA/TinyLLaVABench

Via

Access Paper or Ask Questions