Picture for Khoa Vo

Khoa Vo

Domain Expansion: A Latent Space Construction Framework for Multi-Task Learning

Add code
Jan 27, 2026
Viaarxiv icon

Clutter-Resistant Vision-Language-Action Models through Object-Centric and Geometry Grounding

Add code
Dec 27, 2025
Viaarxiv icon

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

Add code
Nov 18, 2025
Viaarxiv icon

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Add code
Nov 10, 2025
Viaarxiv icon

Amodal Instance Segmentation with Diffusion Shape Prior Estimation

Add code
Sep 26, 2024
Viaarxiv icon

Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge

Add code
Jul 21, 2024
Figure 1 for Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge
Figure 2 for Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge
Figure 3 for Error Detection and Constraint Recovery in Hierarchical Multi-Label Classification without Prior Knowledge
Viaarxiv icon

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Add code
Jun 01, 2024
Viaarxiv icon

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation

Add code
Mar 22, 2024
Viaarxiv icon

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

Add code
Nov 04, 2023
Figure 1 for ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Figure 2 for ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Figure 3 for ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Figure 4 for ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection
Viaarxiv icon

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Add code
Oct 05, 2023
Figure 1 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 2 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 3 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 4 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Viaarxiv icon