Picture for Xiaoming Wei

Xiaoming Wei

Denoising with a Joint-Embedding Predictive Architecture

Add code
Oct 02, 2024
Figure 1 for Denoising with a Joint-Embedding Predictive Architecture
Figure 2 for Denoising with a Joint-Embedding Predictive Architecture
Figure 3 for Denoising with a Joint-Embedding Predictive Architecture
Figure 4 for Denoising with a Joint-Embedding Predictive Architecture
Viaarxiv icon

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

Add code
Sep 26, 2024
Viaarxiv icon

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

Add code
Sep 12, 2024
Figure 1 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 2 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 3 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 4 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Viaarxiv icon

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Add code
Aug 28, 2024
Viaarxiv icon

Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input

Add code
Aug 28, 2024
Figure 1 for Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Figure 2 for Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Figure 3 for Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Figure 4 for Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input
Viaarxiv icon

Fine-gained Zero-shot Video Sampling

Add code
Jul 31, 2024
Viaarxiv icon

Deformable 3D Shape Diffusion Model

Add code
Jul 31, 2024
Viaarxiv icon

BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning

Add code
Apr 01, 2024
Viaarxiv icon

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

Add code
Mar 01, 2024
Figure 1 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 2 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 3 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Figure 4 for ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Viaarxiv icon

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

Add code
Nov 02, 2023
Viaarxiv icon