Picture for Shaofei Huang

Shaofei Huang

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression

Add code
Dec 22, 2024
Viaarxiv icon

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

Add code
Sep 26, 2024
Viaarxiv icon

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Add code
Aug 28, 2024
Figure 1 for Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Figure 2 for Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Figure 3 for Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Figure 4 for Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Viaarxiv icon

Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

Add code
Mar 09, 2024
Viaarxiv icon

Transferring CLIP's Knowledge into Zero-Shot Point Cloud Semantic Segmentation

Add code
Dec 12, 2023
Viaarxiv icon

Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training

Add code
Dec 04, 2023
Viaarxiv icon

Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation

Add code
Sep 18, 2023
Figure 1 for Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Figure 2 for Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Figure 3 for Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Figure 4 for Discovering Sounding Objects by Audio Queries for Audio Visual Segmentation
Viaarxiv icon

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection

Add code
Jan 06, 2023
Viaarxiv icon

Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline

Add code
Oct 06, 2022
Figure 1 for Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline
Figure 2 for Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline
Figure 3 for Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline
Figure 4 for Cross-Modality Domain Adaptation for Freespace Detection: A Simple yet Effective Baseline
Viaarxiv icon