Picture for Shihan Dou

Shihan Dou

From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling

Add code
Oct 01, 2025
Viaarxiv icon

MDAR: A Multi-scene Dynamic Audio Reasoning Benchmark

Add code
Sep 26, 2025
Viaarxiv icon

LLMEval-3: A Large-Scale Longitudinal Study on Robust and Fair Evaluation of Large Language Models

Add code
Aug 07, 2025
Viaarxiv icon

Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning

Add code
Jun 04, 2025
Viaarxiv icon

LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation

Add code
Jun 04, 2025
Figure 1 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 2 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 3 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Figure 4 for LLMEval-Med: A Real-world Clinical Benchmark for Medical LLMs with Physician Validation
Viaarxiv icon

Compression Hacking: A Supplementary Perspective on Informatics Metric of Language Models from Geometric Distortion

Add code
May 23, 2025
Viaarxiv icon

Two Minds Better Than One: Collaborative Reward Modeling for LLM Alignment

Add code
May 19, 2025
Viaarxiv icon

Improving RL Exploration for LLM Reasoning through Retrospective Replay

Add code
Apr 19, 2025
Viaarxiv icon

PFDial: A Structured Dialogue Instruction Fine-tuning Method Based on UML Flowcharts

Add code
Mar 09, 2025
Viaarxiv icon

Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric

Add code
Feb 25, 2025
Figure 1 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 2 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 3 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Figure 4 for Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric
Viaarxiv icon