Picture for Rui Zhao

Rui Zhao

State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China

RemDet: Rethinking Efficient Model Design for UAV Object Detection

Add code
Dec 13, 2024
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon

Representation Purification for End-to-End Speech Translation

Add code
Dec 05, 2024
Figure 1 for Representation Purification for End-to-End Speech Translation
Figure 2 for Representation Purification for End-to-End Speech Translation
Figure 3 for Representation Purification for End-to-End Speech Translation
Figure 4 for Representation Purification for End-to-End Speech Translation
Viaarxiv icon

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Add code
Dec 02, 2024
Viaarxiv icon

Training a Label-Noise-Resistant GNN with Reduced Complexity

Add code
Nov 17, 2024
Viaarxiv icon

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Add code
Oct 17, 2024
Viaarxiv icon

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Add code
Oct 17, 2024
Figure 1 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 2 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 3 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 4 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Viaarxiv icon

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Add code
Oct 10, 2024
Figure 1 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 2 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 3 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Figure 4 for EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models
Viaarxiv icon

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation

Add code
Oct 07, 2024
Figure 1 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 2 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 3 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 4 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Viaarxiv icon

Hybrid Mamba for Few-Shot Segmentation

Add code
Sep 29, 2024
Figure 1 for Hybrid Mamba for Few-Shot Segmentation
Figure 2 for Hybrid Mamba for Few-Shot Segmentation
Figure 3 for Hybrid Mamba for Few-Shot Segmentation
Figure 4 for Hybrid Mamba for Few-Shot Segmentation
Viaarxiv icon