Picture for Juexiao Zhang

Juexiao Zhang

CRAG: Can 3D Generative Models Help 3D Assembly?

Add code
Feb 26, 2026
Viaarxiv icon

From Intention to Execution: Probing the Generalization Boundaries of Vision-Language-Action Models

Add code
Jun 11, 2025
Viaarxiv icon

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

Add code
Jan 17, 2025
Figure 1 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 2 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 3 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Figure 4 for When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis
Viaarxiv icon

CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos

Add code
Nov 26, 2024
Figure 1 for CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Figure 2 for CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Figure 3 for CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Figure 4 for CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Viaarxiv icon

Multiview Scene Graph

Add code
Oct 15, 2024
Figure 1 for Multiview Scene Graph
Figure 2 for Multiview Scene Graph
Figure 3 for Multiview Scene Graph
Figure 4 for Multiview Scene Graph
Viaarxiv icon

VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model

Add code
Oct 11, 2024
Viaarxiv icon

Tell Me Where You Are: Multimodal LLMs Meet Place Recognition

Add code
Jun 25, 2024
Figure 1 for Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Figure 2 for Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Figure 3 for Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Figure 4 for Tell Me Where You Are: Multimodal LLMs Meet Place Recognition
Viaarxiv icon

LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images

Add code
Mar 27, 2024
Figure 1 for LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Figure 2 for LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Figure 3 for LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Figure 4 for LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Viaarxiv icon

ActFormer: Scalable Collaborative Perception via Active Queries

Add code
Mar 08, 2024
Figure 1 for ActFormer: Scalable Collaborative Perception via Active Queries
Figure 2 for ActFormer: Scalable Collaborative Perception via Active Queries
Figure 3 for ActFormer: Scalable Collaborative Perception via Active Queries
Figure 4 for ActFormer: Scalable Collaborative Perception via Active Queries
Viaarxiv icon

URLOST: Unsupervised Representation Learning without Stationarity or Topology

Add code
Oct 06, 2023
Viaarxiv icon