Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bozhou Li

Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Oct 08, 2024

Bozhou Li, Hao Liang, Yang Li, Fangcheng Fu, Hongzhi Yin, Conghui He, Wentao Zhang

Figure 1 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Figure 2 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Figure 3 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Figure 4 for Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models

Abstract:During the pretraining phase, large language models (LLMs) acquire vast amounts of knowledge from extensive text corpora. Nevertheless, in later stages such as fine-tuning and inference, the model may encounter knowledge not covered in the initial training, which can lead to hallucinations and degraded performance. This issue has a profound impact on the model's capabilities, as it will inevitably face out-of-scope knowledge after pretraining. Furthermore, fine-tuning is often required to adapt LLMs to domain-specific tasks. However, this phenomenon limits the model's ability to learn and integrate new information during fine-tuning. The effectiveness of fine-tuning largely depends on the type of knowledge involved. Existing research suggests that fine-tuning the model on partially mastered knowledge-for instance, question-answer pairs where the model has a chance of providing correct responses under non-greedy decoding-can enable the model to acquire new knowledge while mitigating hallucination. Notably, this approach can still lead to the forgetting of fully mastered knowledge, constraining the fine-tuning dataset to a narrower range and limiting the model's overall potential for improvement. Given the model's intrinsic reasoning abilities and the interconnectedness of different knowledge areas, it is likely that as the model's capacity to utilize existing knowledge improves during fine-tuning, previously unmastered knowledge may become more understandable. To explore this hypothesis, we conducted experiments and, based on the results, proposed a two-stage fine-tuning strategy. This approach not only improves the model's overall test accuracy and knowledge retention but also preserves its accuracy on previously mastered content. When fine-tuning on the WikiQA dataset, our method increases the amount of knowledge acquired by the model in this stage by 24%.

Via

Access Paper or Ask Questions

Are Bigger Encoders Always Better in Vision Large Models?

Aug 01, 2024

Bozhou Li, Hao Liang, Zimo Meng, Wentao Zhang

Abstract:In recent years, multimodal large language models (MLLMs) have shown strong potential in real-world applications. They are developing rapidly due to their remarkable ability to comprehend multimodal information and their inherent powerful cognitive and reasoning capabilities. Among MLLMs, vision language models (VLM) stand out for their ability to understand vision information. However, the scaling trend of VLMs under the current mainstream paradigm has not been extensively studied. Whether we can achieve better performance by training even larger models is still unclear. To address this issue, we conducted experiments on the pretraining stage of MLLMs. We conduct our experiment using different encoder sizes and large language model (LLM) sizes. Our findings indicate that merely increasing the size of encoders does not necessarily enhance the performance of VLMs. Moreover, we analyzed the effects of LLM backbone parameter size and data quality on the pretraining outcomes. Additionally, we explored the differences in scaling laws between LLMs and VLMs.

Via

Access Paper or Ask Questions

A Survey of Multimodal Large Language Model from A Data-centric Perspective

May 26, 2024

Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang

Figure 1 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 2 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 3 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Figure 4 for A Survey of Multimodal Large Language Model from A Data-centric Perspective

Abstract:Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field.

Via

Access Paper or Ask Questions