Picture for Jianhao Yan

Jianhao Yan

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

Add code
Mar 27, 2025
Viaarxiv icon

RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction

Add code
Feb 25, 2025
Viaarxiv icon

Unveiling Attractor Cycles in Large Language Models: A Dynamical Systems View of Successive Paraphrasing

Add code
Feb 21, 2025
Viaarxiv icon

Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels

Add code
Nov 21, 2024
Figure 1 for Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Figure 2 for Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Figure 3 for Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Figure 4 for Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Viaarxiv icon

Keys to Robust Edits: from Theoretical Insights to Practical Advances

Add code
Oct 12, 2024
Viaarxiv icon

ELICIT: LLM Augmentation via External In-Context Capability

Add code
Oct 12, 2024
Figure 1 for ELICIT: LLM Augmentation via External In-Context Capability
Figure 2 for ELICIT: LLM Augmentation via External In-Context Capability
Figure 3 for ELICIT: LLM Augmentation via External In-Context Capability
Figure 4 for ELICIT: LLM Augmentation via External In-Context Capability
Viaarxiv icon

See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

Add code
Aug 16, 2024
Figure 1 for See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Figure 2 for See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Figure 3 for See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Figure 4 for See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses
Viaarxiv icon

GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels

Add code
Jul 04, 2024
Figure 1 for GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Figure 2 for GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Figure 3 for GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Figure 4 for GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Viaarxiv icon

What Have We Achieved on Non-autoregressive Translation?

Add code
May 21, 2024
Figure 1 for What Have We Achieved on Non-autoregressive Translation?
Figure 2 for What Have We Achieved on Non-autoregressive Translation?
Figure 3 for What Have We Achieved on Non-autoregressive Translation?
Figure 4 for What Have We Achieved on Non-autoregressive Translation?
Viaarxiv icon

RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models

Add code
Feb 22, 2024
Viaarxiv icon