Picture for Xiaoyu Yue

Xiaoyu Yue

Diffusion Models Need Visual Priors for Image Generation

Add code
Oct 11, 2024
Figure 1 for Diffusion Models Need Visual Priors for Image Generation
Figure 2 for Diffusion Models Need Visual Priors for Image Generation
Figure 3 for Diffusion Models Need Visual Priors for Image Generation
Figure 4 for Diffusion Models Need Visual Priors for Image Generation
Viaarxiv icon

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Add code
Oct 08, 2023
Figure 1 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 2 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 3 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Figure 4 for OV-PARTS: Towards Open-Vocabulary Part Segmentation
Viaarxiv icon

Understanding Masked Autoencoders From a Local Contrastive Perspective

Add code
Oct 03, 2023
Viaarxiv icon

In Defense of Clip-based Video Relation Detection

Add code
Jul 18, 2023
Viaarxiv icon

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Add code
Dec 10, 2021
Figure 1 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 2 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 3 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Figure 4 for Rethinking the Two-Stage Framework for Grounded Situation Recognition
Viaarxiv icon

MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding

Add code
Aug 14, 2021
Figure 1 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 2 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 3 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Figure 4 for MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding
Viaarxiv icon

Vision Transformer with Progressive Sampling

Add code
Aug 03, 2021
Figure 1 for Vision Transformer with Progressive Sampling
Figure 2 for Vision Transformer with Progressive Sampling
Figure 3 for Vision Transformer with Progressive Sampling
Figure 4 for Vision Transformer with Progressive Sampling
Viaarxiv icon

Spatial Dual-Modality Graph Reasoning for Key Information Extraction

Add code
Mar 26, 2021
Figure 1 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 2 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 3 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Figure 4 for Spatial Dual-Modality Graph Reasoning for Key Information Extraction
Viaarxiv icon

HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

Add code
Aug 12, 2020
Figure 1 for HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
Figure 2 for HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
Figure 3 for HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
Figure 4 for HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
Viaarxiv icon

RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition

Add code
Jul 17, 2020
Figure 1 for RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Figure 2 for RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Figure 3 for RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Figure 4 for RobustScanner: Dynamically Enhancing Positional Clues for Robust Text Recognition
Viaarxiv icon