Picture for Ken Deng

Ken Deng

Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation

Add code
Jan 29, 2026
Viaarxiv icon

ShopSimulator: Evaluating and Exploring RL-Driven LLM Agent for Shopping Assistants

Add code
Jan 26, 2026
Viaarxiv icon

SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models

Add code
Nov 07, 2025
Viaarxiv icon

GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation

Add code
Aug 19, 2025
Figure 1 for GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Figure 2 for GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Figure 3 for GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Figure 4 for GeoSAM2: Unleashing the Power of SAM2 for 3D Part Segmentation
Viaarxiv icon

CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models

Add code
Feb 23, 2025
Figure 1 for CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Figure 2 for CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Figure 3 for CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Figure 4 for CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Viaarxiv icon

ExecRepoBench: Multi-level Executable Code Completion Evaluation

Add code
Dec 16, 2024
Figure 1 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 2 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 3 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Figure 4 for ExecRepoBench: Multi-level Executable Code Completion Evaluation
Viaarxiv icon

DetailGen3D: Generative 3D Geometry Enhancement via Data-Dependent Flow

Add code
Nov 25, 2024
Viaarxiv icon

M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation

Add code
Oct 28, 2024
Figure 1 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 2 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 3 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Figure 4 for M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation
Viaarxiv icon

MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models

Add code
Oct 15, 2024
Figure 1 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 2 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 3 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Figure 4 for MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Viaarxiv icon

DDK: Distilling Domain Knowledge for Efficient Large Language Models

Add code
Jul 23, 2024
Figure 1 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 2 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 3 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Figure 4 for DDK: Distilling Domain Knowledge for Efficient Large Language Models
Viaarxiv icon