Picture for Jinyoung Yeo

Jinyoung Yeo

Quantifying Self-Awareness of Knowledge in Large Language Models

Add code
Sep 18, 2025
Viaarxiv icon

Fast and Fluent Diffusion Language Models via Convolutional Decoding and Rejective Fine-tuning

Add code
Sep 18, 2025
Viaarxiv icon

Designing Memory-Augmented AR Agents for Spatiotemporal Reasoning in Personalized Task Assistance

Add code
Aug 12, 2025
Viaarxiv icon

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Add code
May 29, 2025
Viaarxiv icon

LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Add code
May 26, 2025
Figure 1 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Figure 2 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Figure 3 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Figure 4 for LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study
Viaarxiv icon

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

Add code
May 22, 2025
Viaarxiv icon

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Add code
May 21, 2025
Figure 1 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 2 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 3 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Figure 4 for Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Viaarxiv icon

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Add code
May 19, 2025
Viaarxiv icon

KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context

Add code
Dec 10, 2024
Figure 1 for KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Figure 2 for KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Figure 3 for KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Figure 4 for KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Viaarxiv icon

Stop Playing the Guessing Game! Target-free User Simulation for Evaluating Conversational Recommender Systems

Add code
Nov 25, 2024
Viaarxiv icon