Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Dec 26, 2023

Xijie Huang, Li Lyna Zhang, Kwang-Ting Cheng, Mao Yang

Figure 1 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Figure 2 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Figure 3 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Figure 4 for Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have shown impressive capabilities in various tasks, yet they still struggle with math reasoning. Despite efforts to optimize Chain-of-Thoughts (CoT) prompts and fine-tune LLMs, the potential of few-shot learning remains unexplored. In this work, we propose CoT-Influx, a novel approach pushing the boundaries of few-shot CoT learning to improve LLM math reasoning capabilities. CoT-Influx addresses the challenges of the selection of useful examples and limited number of examples due to restricted context window length. Inspired by our observation that natural language inputs contain many redundancy, we propose a coarse-to-fine pruner as a plug-and-play module for LLMs, which first identifies as many crucial CoT examples as possible and then further prunes unimportant tokens within the context window. To train the pruner, we collect a math reasoning dataset with diverse difficulty and steps, introduce a reward to measure both the input's effectiveness for math reasoning and token length constraints, and propose a novel training approach with reinforcement learning. As a result, CoT-Influx significantly outperforms CoT and few-shot prompting baselines across various LLMs (LLaMA2-7B, 13B, 70B) and 5 mathematical datasets, achieving up to 4.55% absolute improvements. Remarkably, without any fine-tuning, LLaMA2-70B with CoT-Influx surpasses GPT-3.5 and a wide range of larger LLMs (PaLM, Minerva, etc.) on the GSM8K.

View paper on

Share this with someone who'll enjoy it:

Title:Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning

Paper and Code