Picture for Bill Yuchen Lin

Bill Yuchen Lin

Shammie

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Add code
Mar 30, 2025
Viaarxiv icon

Small Models Struggle to Learn from Strong Reasoners

Add code
Feb 17, 2025
Viaarxiv icon

SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities

Add code
Feb 17, 2025
Viaarxiv icon

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Add code
Feb 03, 2025
Figure 1 for ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Figure 2 for ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Figure 3 for ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Figure 4 for ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning
Viaarxiv icon

VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models

Add code
Nov 26, 2024
Figure 1 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 2 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 3 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Figure 4 for VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Viaarxiv icon

Stronger Models are NOT Stronger Teachers for Instruction Tuning

Add code
Nov 12, 2024
Figure 1 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 2 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 3 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Figure 4 for Stronger Models are NOT Stronger Teachers for Instruction Tuning
Viaarxiv icon

On Memorization of Large Language Models in Logical Reasoning

Add code
Oct 30, 2024
Figure 1 for On Memorization of Large Language Models in Logical Reasoning
Figure 2 for On Memorization of Large Language Models in Logical Reasoning
Figure 3 for On Memorization of Large Language Models in Logical Reasoning
Figure 4 for On Memorization of Large Language Models in Logical Reasoning
Viaarxiv icon

Latent Action Pretraining from Videos

Add code
Oct 15, 2024
Figure 1 for Latent Action Pretraining from Videos
Figure 2 for Latent Action Pretraining from Videos
Figure 3 for Latent Action Pretraining from Videos
Figure 4 for Latent Action Pretraining from Videos
Viaarxiv icon

CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs

Add code
Oct 03, 2024
Figure 1 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 2 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 3 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Figure 4 for CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of LLMs
Viaarxiv icon

Visual Perception in Text Strings

Add code
Oct 02, 2024
Figure 1 for Visual Perception in Text Strings
Figure 2 for Visual Perception in Text Strings
Figure 3 for Visual Perception in Text Strings
Figure 4 for Visual Perception in Text Strings
Viaarxiv icon