Picture for Philippe Laban

Philippe Laban

BingoGuard: LLM Content Moderation Tools with Risk Levels

Add code
Mar 09, 2025
Viaarxiv icon

Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding

Add code
Feb 17, 2025
Viaarxiv icon

SummExecEdit: A Factual Consistency Benchmark in Summarization with Executable Edits

Add code
Dec 17, 2024
Viaarxiv icon

CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments

Add code
Nov 04, 2024
Viaarxiv icon

Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage

Add code
Oct 20, 2024
Figure 1 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 2 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 3 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Figure 4 for Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage
Viaarxiv icon

Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits

Add code
Sep 26, 2024
Viaarxiv icon

Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems

Add code
Jul 01, 2024
Figure 1 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 2 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 3 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Figure 4 for Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
Viaarxiv icon

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

Add code
Apr 26, 2024
Figure 1 for Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Figure 2 for Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Figure 3 for Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Figure 4 for Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions
Viaarxiv icon

MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

Add code
Apr 16, 2024
Figure 1 for MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Figure 2 for MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Figure 3 for MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Figure 4 for MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Viaarxiv icon

Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment

Add code
Nov 14, 2023
Figure 1 for Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Figure 2 for Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Figure 3 for Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Figure 4 for Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Viaarxiv icon