Picture for Bill Yuchen Lin

Bill Yuchen Lin

Shammie

Small Models Struggle to Learn from Strong Reasoners

Add code
Feb 17, 2025
Viaarxiv icon

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Add code
Feb 17, 2025
Viaarxiv icon

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Add code
Feb 03, 2025
Viaarxiv icon

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Add code
Nov 26, 2024
Figure 1 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 2 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 3 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 4 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Viaarxiv icon

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Add code
Nov 12, 2024
Figure 1 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 2 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 3 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 4 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Viaarxiv icon

On Memorization of Large Language Models in Logical Reasoning

Add code
Oct 30, 2024
Figure 1 for On Memorization of Large Language Models in Logical Reasoning
Figure 2 for On Memorization of Large Language Models in Logical Reasoning
Figure 3 for On Memorization of Large Language Models in Logical Reasoning
Figure 4 for On Memorization of Large Language Models in Logical Reasoning
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Add code
Oct 03, 2024
Figure 1 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 2 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 3 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 4 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Viaarxiv icon

Visual Perception in Text Strings

Add code
Oct 02, 2024
Figure 1 for Visual Perception in Text Strings
Figure 2 for Visual Perception in Text Strings
Figure 3 for Visual Perception in Text Strings
Figure 4 for Visual Perception in Text Strings
Viaarxiv icon

HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions

Add code
Sep 26, 2024
Figure 1 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 2 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 3 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Figure 4 for HAICOSYSTEM: An Ecosystem for Sandboxing Safety Risks in Human-AI Interactions
Viaarxiv icon