Picture for Yizhe Yang

Yizhe Yang

How Far Are We? Systematic Evaluation of LLMs vs. Human Experts in Mathematical Contest in Modeling

Add code
Apr 06, 2026
Viaarxiv icon

Consistent Client Simulation for Motivational Interviewing-based Counseling

Add code
Feb 05, 2025
Viaarxiv icon

CAMI: A Counselor Agent Supporting Motivational Interviewing through State Inference and Topic Exploration

Add code
Feb 05, 2025
Viaarxiv icon

EvoWiki: Evaluating LLMs on Evolving Knowledge

Add code
Dec 18, 2024
Figure 1 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 2 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 3 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Figure 4 for EvoWiki: Evaluating LLMs on Evolving Knowledge
Viaarxiv icon

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Add code
Nov 18, 2024
Figure 1 for PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Figure 2 for PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Figure 3 for PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Figure 4 for PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment
Viaarxiv icon

Speaker Verification in Agent-Generated Conversations

Add code
May 16, 2024
Figure 1 for Speaker Verification in Agent-Generated Conversations
Figure 2 for Speaker Verification in Agent-Generated Conversations
Figure 3 for Speaker Verification in Agent-Generated Conversations
Figure 4 for Speaker Verification in Agent-Generated Conversations
Viaarxiv icon

Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation

Add code
Feb 28, 2024
Figure 1 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 2 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 3 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Figure 4 for Have Seen Me Before? Automating Dataset Updates Towards Reliable and Timely Evaluation
Viaarxiv icon

Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue

Add code
Dec 13, 2023
Figure 1 for Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Figure 2 for Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Figure 3 for Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Figure 4 for Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Viaarxiv icon

TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer

Add code
Nov 14, 2023
Figure 1 for TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
Figure 2 for TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
Figure 3 for TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
Figure 4 for TSST: A Benchmark and Evaluation Models for Text Speech-Style Transfer
Viaarxiv icon

MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

Add code
Oct 29, 2023
Figure 1 for MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Figure 2 for MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Figure 3 for MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Figure 4 for MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications
Viaarxiv icon