Picture for Khalid Saifullah

Khalid Saifullah

LiveBench: A Challenging, Contamination-Free LLM Benchmark

Add code
Jun 27, 2024
Viaarxiv icon

CinePile: A Long Video Question Answering Dataset and Benchmark

Add code
May 14, 2024
Viaarxiv icon

Coercing LLMs to do and reveal anything

Add code
Feb 21, 2024
Viaarxiv icon

On the Reliability of Watermarks for Large Language Models

Add code
Jun 30, 2023
Viaarxiv icon

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

Add code
Jun 29, 2023
Viaarxiv icon

Seeing in Words: Learning to Classify through Language Bottlenecks

Add code
Jun 29, 2023
Viaarxiv icon

Reinforcement Learning finetuned Vision-Code Transformer for UI-to-Code Generation

Add code
May 24, 2023
Viaarxiv icon