Picture for Aaron Courville

Aaron Courville

Universite de Montreal

Stick-breaking Attention

Add code
Oct 23, 2024
Viaarxiv icon

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

Add code
Oct 23, 2024
Figure 1 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 2 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 3 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Figure 4 for Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Viaarxiv icon

Neuroplastic Expansion in Deep Reinforcement Learning

Add code
Oct 10, 2024
Figure 1 for Neuroplastic Expansion in Deep Reinforcement Learning
Figure 2 for Neuroplastic Expansion in Deep Reinforcement Learning
Figure 3 for Neuroplastic Expansion in Deep Reinforcement Learning
Figure 4 for Neuroplastic Expansion in Deep Reinforcement Learning
Viaarxiv icon

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Add code
Oct 02, 2024
Figure 1 for Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Figure 2 for Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Figure 3 for Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Figure 4 for Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL
Viaarxiv icon

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

Add code
Oct 02, 2024
Figure 1 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 2 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 3 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Figure 4 for VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment
Viaarxiv icon

Not All LLM Reasoners Are Created Equal

Add code
Oct 02, 2024
Figure 1 for Not All LLM Reasoners Are Created Equal
Figure 2 for Not All LLM Reasoners Are Created Equal
Figure 3 for Not All LLM Reasoners Are Created Equal
Figure 4 for Not All LLM Reasoners Are Created Equal
Viaarxiv icon

Managing multiple agents by automatically adjusting incentives

Add code
Sep 03, 2024
Viaarxiv icon

SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning

Add code
Jul 03, 2024
Figure 1 for SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Figure 2 for SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Figure 3 for SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Figure 4 for SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Viaarxiv icon

Multimodal foundation world models for generalist embodied agents

Add code
Jun 26, 2024
Viaarxiv icon

On the consistency of hyper-parameter selection in value-based deep reinforcement learning

Add code
Jun 25, 2024
Figure 1 for On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Figure 2 for On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Figure 3 for On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Figure 4 for On the consistency of hyper-parameter selection in value-based deep reinforcement learning
Viaarxiv icon