Picture for Wei-Lin Chiang

Wei-Lin Chiang

Prompt-to-Leaderboard

Add code
Feb 20, 2025
Viaarxiv icon

Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards

Add code
Jan 13, 2025
Figure 1 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 2 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 3 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Figure 4 for Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Viaarxiv icon

VisionArena: 230K Real World User-VLM Conversations with Preference Labels

Add code
Dec 11, 2024
Viaarxiv icon

SkyServe: Serving AI Models across Regions and Clouds with Spot Instances

Add code
Nov 03, 2024
Figure 1 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 2 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 3 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Figure 4 for SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Viaarxiv icon

How to Evaluate Reward Models for RLHF

Add code
Oct 18, 2024
Figure 1 for How to Evaluate Reward Models for RLHF
Figure 2 for How to Evaluate Reward Models for RLHF
Figure 3 for How to Evaluate Reward Models for RLHF
Figure 4 for How to Evaluate Reward Models for RLHF
Viaarxiv icon

RouteLLM: Learning to Route LLMs with Preference Data

Add code
Jun 26, 2024
Figure 1 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 2 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 3 for RouteLLM: Learning to Route LLMs with Preference Data
Figure 4 for RouteLLM: Learning to Route LLMs with Preference Data
Viaarxiv icon

From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline

Add code
Jun 17, 2024
Viaarxiv icon

OR-Bench: An Over-Refusal Benchmark for Large Language Models

Add code
May 31, 2024
Viaarxiv icon

Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

Add code
Apr 22, 2024
Figure 1 for Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Figure 2 for Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Figure 3 for Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Figure 4 for Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity
Viaarxiv icon

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Add code
Mar 07, 2024
Figure 1 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 2 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 3 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Figure 4 for Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Viaarxiv icon