Picture for Bill Yuchen Lin

Bill Yuchen Lin

Shammie

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Add code
Nov 12, 2024
Viaarxiv icon

On Memorization of Large Language Models in Logical Reasoning

Add code
Oct 30, 2024
Figure 1 for On Memorization of Large Language Models in Logical Reasoning
Figure 2 for On Memorization of Large Language Models in Logical Reasoning
Figure 3 for On Memorization of Large Language Models in Logical Reasoning
Figure 4 for On Memorization of Large Language Models in Logical Reasoning
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Add code
Oct 03, 2024
Viaarxiv icon

Visual Perception in Text Strings

Add code
Oct 02, 2024
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Viaarxiv icon

SimulBench: Evaluating Language Models with Creative Simulation Tasks

Add code
Sep 11, 2024
Viaarxiv icon

OfficeBench: Benchmarking Language Agents across Multiple Applications for Office Automation

Add code
Jul 26, 2024
Viaarxiv icon

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism

Add code
Jul 15, 2024
Viaarxiv icon

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Add code
Jun 26, 2024
Figure 1 for WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Figure 2 for WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Figure 3 for WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Figure 4 for WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Viaarxiv icon