Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuzhen Sun

The Scaling Law for LoRA Base on Mutual Information Upper Bound

Jan 06, 2025

Jing Zhang, Hui Gao, Peng Zhang, Shuzhen Sun, Chang Yang, Yuexian Hou

Figure 1 for The Scaling Law for LoRA Base on Mutual Information Upper Bound

Figure 2 for The Scaling Law for LoRA Base on Mutual Information Upper Bound

Figure 3 for The Scaling Law for LoRA Base on Mutual Information Upper Bound

Figure 4 for The Scaling Law for LoRA Base on Mutual Information Upper Bound

Abstract:LoRA (Low-Rank Adaptation) is a widely used model fine-tuning method. In fine-tuning, the law among model performance, model parameters, and data complexity has been a focal issue in the field. Existing methods often leverage external metrics (such as cross-entropy or perplexity) to evaluate model performance. In the fine-tuning process for large models, two types of knowledge are typically involved: the frozen, general knowledge acquired by the model during pre-training and the new knowledge learned through the LoRA module from the current data. Generally, the less LoRA's learned knowledge relies on the large model, the more it captures the specific knowledge of new data, thereby enhancing its adaptability to new tasks. However, external metrics do not readily capture the dependency relationship between these two types of knowledge. Therefore, we designed an internal metric based on the Mutual Information Upper Bound (MIUB) theory to investigate the scaling law of large-model LoRA fine-tuning. In our experiments, we validated this approach on benchmark datasets, using the Llama3-8B and Phi3-3B models. The results show that the proposed MIUB metric aligns more accurately and stably with the scaling law of LoRA fine-tuning compared to cross-entropy and perplexity.

Via

Access Paper or Ask Questions

SEE: Sememe Entanglement Encoding for Transformer-bases Models Compression

Dec 15, 2024

Jing Zhang, Shuzhen Sun, Peng Zhang, Guangxing Cao, Hui Gao, Xindian Ma, Nan Xu, Yuexian Hou

Abstract:Transformer-based large language models exhibit groundbreaking capabilities, but their storage and computational costs are prohibitively high, limiting their application in resource-constrained scenarios. An effective approach is to eliminate redundant model parameters and computational costs while incorporating efficient expert-derived knowledge structures to achieve a balance between compression and performance. Therefore, we propose the \textit{Sememe Entanglement Encoding (SEE)} algorithm. Guided by expert prior knowledge, the model is compressed through the low-rank approximation idea. In Entanglement Embedding, basic semantic units such as sememes are represented as low-dimensional vectors, and then reconstructed into high-dimensional word embeddings through the combination of generalized quantum entanglement. We adapt the Sememe Entanglement Encoding algorithm to transformer-based models of different magnitudes. Experimental results indicate that our approach achieves stable performance while compressing model parameters and computational costs.

Via

Access Paper or Ask Questions