Picture for Sijie Cheng

Sijie Cheng

Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress

Add code
Mar 18, 2026
Viaarxiv icon

Building Egocentric Procedural AI Assistant: Methods, Benchmarks, and Challenges

Add code
Nov 17, 2025
Viaarxiv icon

StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs

Add code
Mar 26, 2025
Figure 1 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 2 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 3 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Figure 4 for StableToolBench-MirrorAPI: Modeling Tool Environments as Mirrors of 7,000+ Real-World APIs
Viaarxiv icon

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI

Add code
Oct 15, 2024
Viaarxiv icon

Instruction-Guided Visual Masking

Add code
May 30, 2024
Figure 1 for Instruction-Guided Visual Masking
Figure 2 for Instruction-Guided Visual Masking
Figure 3 for Instruction-Guided Visual Masking
Figure 4 for Instruction-Guided Visual Masking
Viaarxiv icon

ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models

Add code
May 24, 2024
Figure 1 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 2 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 3 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Figure 4 for ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Viaarxiv icon

StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models

Add code
Mar 13, 2024
Figure 1 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 2 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 3 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Figure 4 for StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models
Viaarxiv icon

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Add code
Feb 28, 2024
Viaarxiv icon

DEEM: Dynamic Experienced Expert Modeling for Stance Detection

Add code
Feb 23, 2024
Figure 1 for DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Figure 2 for DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Figure 3 for DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Figure 4 for DEEM: Dynamic Experienced Expert Modeling for Stance Detection
Viaarxiv icon

Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models

Add code
Jan 22, 2024
Figure 1 for Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
Figure 2 for Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
Figure 3 for Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
Figure 4 for Speak It Out: Solving Symbol-Related Problems with Symbol-to-Language Conversion for Language Models
Viaarxiv icon