Picture for Alessandro Sordoni

Alessandro Sordoni

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Add code
Oct 02, 2024
Figure 1 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 2 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 3 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 4 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Viaarxiv icon

Not All LLM Reasoners Are Created Equal

Add code
Oct 02, 2024
Figure 1 for Not All LLM Reasoners Are Created Equal
Figure 2 for Not All LLM Reasoners Are Created Equal
Figure 3 for Not All LLM Reasoners Are Created Equal
Figure 4 for Not All LLM Reasoners Are Created Equal
Viaarxiv icon

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

Add code
Aug 13, 2024
Viaarxiv icon

Improving Context-Aware Preference Modeling for Language Models

Add code
Jul 20, 2024
Viaarxiv icon

Efficient Adversarial Training in LLMs with Continuous Attacks

Add code
May 24, 2024
Viaarxiv icon

Towards Modular LLMs by Building and Reusing a Library of LoRAs

Add code
May 18, 2024
Viaarxiv icon

V-STaR: Training Verifiers for Self-Taught Reasoners

Add code
Feb 09, 2024
Figure 1 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 2 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 3 for V-STaR: Training Verifiers for Self-Taught Reasoners
Figure 4 for V-STaR: Training Verifiers for Self-Taught Reasoners
Viaarxiv icon

Guiding Language Model Reasoning with Planning Tokens

Add code
Oct 09, 2023
Viaarxiv icon

Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference

Add code
Jun 21, 2023
Viaarxiv icon

On the Compositional Generalization Gap of In-Context Learning

Add code
Nov 15, 2022
Viaarxiv icon