Picture for Yiyang Li

Yiyang Li

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Add code
Mar 02, 2026
Viaarxiv icon

OPBench: A Graph Benchmark to Combat the Opioid Crisis

Add code
Feb 16, 2026
Viaarxiv icon

Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents

Add code
Jan 29, 2026
Viaarxiv icon

LongDA: Benchmarking LLM Agents for Long-Document Data Analysis

Add code
Jan 05, 2026
Viaarxiv icon

Instance-level Randomization: Toward More Stable LLM Evaluations

Add code
Sep 16, 2025
Figure 1 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 2 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 3 for Instance-level Randomization: Toward More Stable LLM Evaluations
Figure 4 for Instance-level Randomization: Toward More Stable LLM Evaluations
Viaarxiv icon

OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics

Add code
Jun 12, 2025
Figure 1 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 2 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 3 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Figure 4 for OIBench: Benchmarking Strong Reasoning Models with Olympiad in Informatics
Viaarxiv icon

Graph Foundation Models: A Comprehensive Survey

Add code
May 21, 2025
Viaarxiv icon

EfficientLLM: Efficiency in Large Language Models

Add code
May 20, 2025
Viaarxiv icon

NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning

Add code
Dec 20, 2024
Figure 1 for NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
Figure 2 for NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
Figure 3 for NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
Figure 4 for NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
Viaarxiv icon

CIER: A Novel Experience Replay Approach with Causal Inference in Deep Reinforcement Learning

Add code
May 14, 2024
Viaarxiv icon