Picture for Samuel Albanie

Samuel Albanie

ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities

Add code
Dec 09, 2024
Viaarxiv icon

How to Merge Your Multimodal Models Over Time?

Add code
Dec 09, 2024
Viaarxiv icon

Active Data Curation Effectively Distills Large-Scale Multimodal Models

Add code
Nov 27, 2024
Viaarxiv icon

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Add code
Nov 07, 2024
Figure 1 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 2 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 3 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Figure 4 for Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
Viaarxiv icon

A Practitioner's Guide to Continual Multimodal Pretraining

Add code
Aug 26, 2024
Viaarxiv icon

GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models

Add code
Aug 21, 2024
Figure 1 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 2 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 3 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Figure 4 for GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Viaarxiv icon

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Viaarxiv icon

HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

Add code
Jun 05, 2024
Figure 1 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 2 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 3 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 4 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Viaarxiv icon

Inverse Constitutional AI: Compressing Preferences into Principles

Add code
Jun 02, 2024
Viaarxiv icon

A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision

Add code
May 16, 2024
Figure 1 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 2 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 3 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Figure 4 for A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision
Viaarxiv icon