Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Oct 27, 2024

Xuan He, Da Yin, Nanyun, Peng

Figure 1 for Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Figure 2 for Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Figure 3 for Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Figure 4 for Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Share this with someone who'll enjoy it:

Abstract:How can "weak teacher models" such as average human annotators or existing AI systems, effectively supervise LLMs to improve performance on hard reasoning tasks, especially those that challenge and requires expertise or daily practice from the teacher models? In this paper, we seek for empirical answers to this question by investigating various data-driven strategies that offer supervision data at different quality levels upon tasks of varying complexity. Two intuitive strategies emerge for teacher models to provide supervision during alignment training: 1) using lower-quality supervision from complete tasks that match the difficulty of the target reasoning tasks, and 2) leveraging higher-quality supervision from easier subtasks that are less challenging. Interestingly, we find that even when the outcome error rate for hard task supervision is high (e.g., 90\%), training on such data can outperform perfectly correct supervision on easier subtasks on multiple hard math benchmarks. We further identify a more critical factor influencing training performance: step-wise error rates, which indicate the severity of errors in solutions. Specifically, training on hard task supervision with the same outcome error rates but disparate step-wise error rates can lead to a 30\% accuracy gap on MATH benchmark. Our results also reveal that supplementing hard task supervision with the corresponding subtask supervision can yield notable performance improvements than simply combining rephrased hard full task supervision, suggesting new avenues for data augmentation. Data and code are released at \url{https://github.com/hexuan21/Weak-to-Strong}.

View paper on

Share this with someone who'll enjoy it:

Title:Guiding Through Complexity: What Makes Good Supervision for Hard Reasoning Tasks?

Paper and Code