Picture for Shijue Huang

Shijue Huang

From Abstract to Contextual: What LLMs Still Cannot Do in Mathematics

Add code
Jan 30, 2026
Viaarxiv icon

Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey

Add code
Nov 12, 2025
Viaarxiv icon

AdaCtrl: Towards Adaptive and Controllable Reasoning via Difficulty-Aware Budgeting

Add code
May 24, 2025
Viaarxiv icon

Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst

Add code
May 20, 2025
Figure 1 for Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Figure 2 for Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Figure 3 for Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Figure 4 for Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst
Viaarxiv icon

OTC: Optimal Tool Calls via Reinforcement Learning

Add code
Apr 21, 2025
Figure 1 for OTC: Optimal Tool Calls via Reinforcement Learning
Figure 2 for OTC: Optimal Tool Calls via Reinforcement Learning
Figure 3 for OTC: Optimal Tool Calls via Reinforcement Learning
Figure 4 for OTC: Optimal Tool Calls via Reinforcement Learning
Viaarxiv icon

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Add code
Apr 15, 2025
Viaarxiv icon

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Add code
Jan 21, 2025
Viaarxiv icon

CroPrompt: Cross-task Interactive Prompting for Zero-shot Spoken Language Understanding

Add code
Jun 15, 2024
Viaarxiv icon

CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models

Add code
Mar 06, 2024
Viaarxiv icon

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

Add code
Jan 30, 2024
Figure 1 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Figure 2 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Figure 3 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Figure 4 for Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios
Viaarxiv icon