Picture for Liuyi Wang

Liuyi Wang

GroundVTS: Visual Token Sampling in Multimodal Large Language Models for Video Temporal Grounding

Add code
Apr 02, 2026
Viaarxiv icon

Deconfounded Lifelong Learning for Autonomous Driving via Dynamic Knowledge Spaces

Add code
Mar 15, 2026
Viaarxiv icon

RynnBrain: Open Embodied Foundation Models

Add code
Feb 13, 2026
Viaarxiv icon

VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation

Add code
Dec 22, 2025
Figure 1 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 2 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 3 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 4 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Viaarxiv icon

CLASH: Collaborative Large-Small Hierarchical Framework for Continuous Vision-and-Language Navigation

Add code
Dec 11, 2025
Viaarxiv icon

Temporal-Guided Visual Foundation Models for Event-Based Vision

Add code
Nov 09, 2025
Figure 1 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 2 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 3 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Figure 4 for Temporal-Guided Visual Foundation Models for Event-Based Vision
Viaarxiv icon

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Add code
Jul 17, 2025
Figure 1 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 2 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 3 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 4 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Viaarxiv icon

CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation

Add code
Feb 03, 2025
Figure 1 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 2 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 3 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Figure 4 for CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation
Viaarxiv icon

ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark

Add code
Jan 09, 2025
Figure 1 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 2 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 3 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Figure 4 for ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Viaarxiv icon

MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation

Add code
Jun 25, 2024
Viaarxiv icon