Picture for Taco Cohen

Taco Cohen

RL-finetuning LLMs from on- and off-policy data with a single algorithm

Add code
Mar 25, 2025
Viaarxiv icon

The KoLMogorov Test: Compression by Code Generation

Add code
Mar 18, 2025
Viaarxiv icon

Soft Policy Optimization: Online Off-Policy RL for Sequence Models

Add code
Mar 07, 2025
Figure 1 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Figure 2 for Soft Policy Optimization: Online Off-Policy RL for Sequence Models
Viaarxiv icon

Does equivariance matter at scale?

Add code
Oct 30, 2024
Viaarxiv icon

What Makes Large Language Models Reason in (Multi-Turn) Code Generation?

Add code
Oct 10, 2024
Figure 1 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 2 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 3 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Figure 4 for What Makes Large Language Models Reason in (Multi-Turn) Code Generation?
Viaarxiv icon

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Add code
Oct 02, 2024
Figure 1 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 2 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 3 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Figure 4 for RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Viaarxiv icon

Information-driven Affordance Discovery for Efficient Robotic Manipulation

Add code
May 06, 2024
Viaarxiv icon

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

Add code
Feb 07, 2024
Figure 1 for CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Figure 2 for CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Figure 3 for CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Figure 4 for CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Viaarxiv icon

A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems

Add code
Dec 12, 2023
Viaarxiv icon

FoMo Rewards: Can we cast foundation models as reward functions?

Add code
Dec 06, 2023
Viaarxiv icon