Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feihu Jin

Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Mar 04, 2024

Feihu Jin, Yin Liu, Ying Tan

Figure 1 for Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Figure 2 for Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Figure 3 for Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Figure 4 for Derivative-Free Optimization for Low-Rank Adaptation in Large Language Models

Abstract:Parameter-efficient tuning methods such as LoRA could achieve comparable performance to model tuning by tuning a small portion of the parameters. However, substantial computational resources are still required, as this process involves calculating gradients and performing back-propagation throughout the model. Much effort has recently been devoted to utilizing the derivative-free optimization method to eschew the computation of gradients and showcase an augmented level of robustness in few-shot settings. In this paper, we prepend the low-rank modules into each self-attention layer of the model and employ two derivative-free optimization methods to optimize these low-rank modules at each layer alternately. Extensive results on various tasks and language models demonstrate that our proposed method achieves substantial improvement and exhibits clear advantages in memory usage and convergence speed compared to existing gradient-based parameter-efficient tuning and derivative-free optimization methods in few-shot settings.

* 14 pages, 4 figures, 5 tables

Via

Access Paper or Ask Questions

Zero-Shot Chain-of-Thought Reasoning Guided by Evolutionary Algorithms in Large Language Models

Feb 08, 2024

Feihu Jin, Yifan Liu, Ying Tan

Abstract:Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks and exhibited impressive reasoning abilities by applying zero-shot Chain-of-Thought (CoT) prompting. However, due to the evolving nature of sentence prefixes during the pre-training phase, existing zero-shot CoT prompting methods that employ identical CoT prompting across all task instances may not be optimal. In this paper, we introduce a novel zero-shot prompting method that leverages evolutionary algorithms to generate diverse promptings for LLMs dynamically. Our approach involves initializing two CoT promptings, performing evolutionary operations based on LLMs to create a varied set, and utilizing the LLMs to select a suitable CoT prompting for a given problem. Additionally, a rewriting operation, guided by the selected CoT prompting, enhances the understanding of the LLMs about the problem. Extensive experiments conducted across ten reasoning datasets demonstrate the superior performance of our proposed method compared to current zero-shot CoT prompting methods on GPT-3.5-turbo and GPT-4. Moreover, in-depth analytical experiments underscore the adaptability and effectiveness of our method in various reasoning tasks.

* 17 pages, 5 figures, 16 tables

Via

Access Paper or Ask Questions

Instance-aware Prompt Learning for Language Understanding and Generation

Jan 18, 2022

Feihu Jin, Jinliang Lu, Jiajun Zhang, Chengqing Zong

Figure 1 for Instance-aware Prompt Learning for Language Understanding and Generation

Figure 2 for Instance-aware Prompt Learning for Language Understanding and Generation

Figure 3 for Instance-aware Prompt Learning for Language Understanding and Generation

Figure 4 for Instance-aware Prompt Learning for Language Understanding and Generation

Abstract:Recently, prompt learning has become a new paradigm to utilize pre-trained language models (PLMs) and achieves promising results in downstream tasks with a negligible increase of parameters. The current usage of discrete and continuous prompts assumes that the prompt is fixed for a specific task and all samples in the task share the same prompt. However, a task may contain quite diverse samples in which some are easy and others are difficult, and diverse prompts are desirable. In this paper, we propose an instance-aware prompt learning method that learns a different prompt for each instance. Specifically, we suppose that each learnable prompt token has a different contribution to different instances, and we learn the contribution by calculating the relevance score between an instance and each prompt token. The contribution weighted prompt would be instance aware. We apply our method to both unidirectional and bidirectional PLMs on both language understanding and generation tasks. Extensive experiments demonstrate that our method obtains considerable improvements compared to strong baselines. Especially, our method achieves the state-of-the-art on the SuperGLUE few-shot learning benchmark.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions