Picture for Rui Zhao

Rui Zhao

State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China, University of Chinese Academy of Sciences, Beijing, China

On the Suitability of Reinforcement Fine-Tuning to Visual Tasks

Add code
Apr 08, 2025
Viaarxiv icon

Re-Aligning Language to Visual Objects with an Agentic Workflow

Add code
Mar 30, 2025
Viaarxiv icon

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors

Add code
Mar 25, 2025
Viaarxiv icon

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Add code
Mar 13, 2025
Viaarxiv icon

Motion Anything: Any to Motion Generation

Add code
Mar 10, 2025
Viaarxiv icon

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles

Add code
Mar 05, 2025
Viaarxiv icon

Semantic Gaussian Mixture Variational Autoencoder for Sequential Recommendation

Add code
Feb 22, 2025
Viaarxiv icon

PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection

Add code
Feb 21, 2025
Viaarxiv icon

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

Add code
Feb 19, 2025
Viaarxiv icon

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Add code
Jan 01, 2025
Figure 1 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 2 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 3 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Figure 4 for Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model
Viaarxiv icon