Picture for Ronan Le Bras

Ronan Le Bras

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Add code
Nov 22, 2024
Viaarxiv icon

SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs

Add code
Oct 17, 2024
Figure 1 for SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Figure 2 for SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Figure 3 for SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Figure 4 for SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Figure 1 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 2 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 3 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 4 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Viaarxiv icon

WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

Add code
Jul 24, 2024
Figure 1 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 2 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 3 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Figure 4 for WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries
Viaarxiv icon

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Add code
Jun 07, 2024
Figure 1 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 2 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 3 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Figure 4 for WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild
Viaarxiv icon

NovaCOMET: Open Commonsense Foundation Models with Symbolic Knowledge Distillation

Add code
Dec 10, 2023
Viaarxiv icon

MacGyver: Are Large Language Models Creative Problem Solvers?

Add code
Nov 16, 2023
Figure 1 for MacGyver: Are Large Language Models Creative Problem Solvers?
Figure 2 for MacGyver: Are Large Language Models Creative Problem Solvers?
Figure 3 for MacGyver: Are Large Language Models Creative Problem Solvers?
Figure 4 for MacGyver: Are Large Language Models Creative Problem Solvers?
Viaarxiv icon

FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

Add code
Oct 31, 2023
Viaarxiv icon

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

Add code
Jun 04, 2023
Viaarxiv icon

Commonsense Knowledge Transfer for Pre-trained Language Models

Add code
Jun 04, 2023
Viaarxiv icon