Picture for Chenghao Yang

Chenghao Yang

The Chameleon's Limit: Investigating Persona Collapse and Homogenization in Large Language Models

Add code
Apr 27, 2026
Viaarxiv icon

When Agents Look the Same: Quantifying Distillation-Induced Similarity in Tool-Use Behaviors

Add code
Apr 23, 2026
Viaarxiv icon

WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints

Add code
Feb 09, 2026
Viaarxiv icon

DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains

Add code
Nov 14, 2025
Figure 1 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 2 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 3 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Figure 4 for DiscoX: Benchmarking Discourse-Level Translation task in Expert Domains
Viaarxiv icon

Optimizing Diversity and Quality through Base-Aligned Model Collaboration

Add code
Nov 07, 2025
Figure 1 for Optimizing Diversity and Quality through Base-Aligned Model Collaboration
Figure 2 for Optimizing Diversity and Quality through Base-Aligned Model Collaboration
Figure 3 for Optimizing Diversity and Quality through Base-Aligned Model Collaboration
Figure 4 for Optimizing Diversity and Quality through Base-Aligned Model Collaboration
Viaarxiv icon

FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning

Add code
Sep 16, 2025
Figure 1 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 2 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 3 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Figure 4 for FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
Viaarxiv icon

Tokenized Bandit for LLM Decoding and Alignment

Add code
Jun 08, 2025
Figure 1 for Tokenized Bandit for LLM Decoding and Alignment
Figure 2 for Tokenized Bandit for LLM Decoding and Alignment
Figure 3 for Tokenized Bandit for LLM Decoding and Alignment
Figure 4 for Tokenized Bandit for LLM Decoding and Alignment
Viaarxiv icon

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation

Add code
May 27, 2025
Viaarxiv icon

Grounded Persuasive Language Generation for Automated Marketing

Add code
Feb 24, 2025
Figure 1 for Grounded Persuasive Language Generation for Automated Marketing
Figure 2 for Grounded Persuasive Language Generation for Automated Marketing
Figure 3 for Grounded Persuasive Language Generation for Automated Marketing
Figure 4 for Grounded Persuasive Language Generation for Automated Marketing
Viaarxiv icon

CryptoX : Compositional Reasoning Evaluation of Large Language Models

Add code
Feb 08, 2025
Figure 1 for CryptoX : Compositional Reasoning Evaluation of Large Language Models
Figure 2 for CryptoX : Compositional Reasoning Evaluation of Large Language Models
Figure 3 for CryptoX : Compositional Reasoning Evaluation of Large Language Models
Figure 4 for CryptoX : Compositional Reasoning Evaluation of Large Language Models
Viaarxiv icon