Picture for Rui Zhao

Rui Zhao

State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Add code
Jan 01, 2025
Figure 1 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 2 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 3 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 4 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

"They've Stolen My GPL-Licensed Model!": Toward Standardized and Transparent Model Licensing

Add code
Dec 16, 2024
Viaarxiv icon

RemDet: Rethinking Efficient Model Design for UAV Object Detection

Add code
Dec 13, 2024
Figure 1 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 2 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 3 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Figure 4 for RemDet: Rethinking Efficient Model Design for UAV Object Detection
Viaarxiv icon

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Add code
Dec 09, 2024
Viaarxiv icon

Representation Purification for End-to-End Speech Translation

Add code
Dec 05, 2024
Figure 1 for Representation Purification for End-to-End Speech Translation
Figure 2 for Representation Purification for End-to-End Speech Translation
Figure 3 for Representation Purification for End-to-End Speech Translation
Figure 4 for Representation Purification for End-to-End Speech Translation
Viaarxiv icon

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Add code
Dec 02, 2024
Viaarxiv icon

Training a Label-Noise-Resistant GNN with Reduced Complexity

Add code
Nov 17, 2024
Viaarxiv icon

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

Add code
Oct 17, 2024
Figure 1 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 2 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 3 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Figure 4 for DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control
Viaarxiv icon

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Add code
Oct 17, 2024
Viaarxiv icon