Picture for Paul Hongsuck Seo

Paul Hongsuck Seo

Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels

Add code
Sep 30, 2024
Figure 1 for Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Figure 2 for Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Figure 3 for Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Figure 4 for Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels
Viaarxiv icon

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

Add code
Jul 10, 2024
Figure 1 for Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Figure 2 for Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Figure 3 for Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Figure 4 for Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Viaarxiv icon

Learning Correlation Structures for Vision Transformers

Add code
Apr 05, 2024
Figure 1 for Learning Correlation Structures for Vision Transformers
Figure 2 for Learning Correlation Structures for Vision Transformers
Figure 3 for Learning Correlation Structures for Vision Transformers
Figure 4 for Learning Correlation Structures for Vision Transformers
Viaarxiv icon

Zero-shot Referring Image Segmentation with Global-Local Context Features

Add code
Apr 03, 2023
Viaarxiv icon

AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR

Add code
Mar 29, 2023
Viaarxiv icon

IFSeg: Image-free Semantic Segmentation via Vision-Language Model

Add code
Mar 25, 2023
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon

CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation

Add code
Mar 21, 2023
Viaarxiv icon

AVATAR submission to the Ego4D AV Transcription Challenge

Add code
Nov 18, 2022
Viaarxiv icon

AVATAR: Unconstrained Audiovisual Speech Recognition

Add code
Jun 15, 2022
Figure 1 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 2 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 3 for AVATAR: Unconstrained Audiovisual Speech Recognition
Figure 4 for AVATAR: Unconstrained Audiovisual Speech Recognition
Viaarxiv icon