Picture for Mushui Liu

Mushui Liu

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Add code
Mar 07, 2025
Figure 1 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 2 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 3 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Figure 4 for CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
Viaarxiv icon

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Add code
Mar 04, 2025
Viaarxiv icon

MINT: Multi-modal Chain of Thought in Unified Generative Models for Enhanced Image Generation

Add code
Mar 03, 2025
Viaarxiv icon

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

Add code
Feb 10, 2025
Viaarxiv icon

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation

Add code
Nov 21, 2024
Viaarxiv icon

Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision

Add code
Sep 06, 2024
Figure 1 for Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision
Figure 2 for Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision
Figure 3 for Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision
Figure 4 for Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision
Viaarxiv icon

Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

Add code
Aug 22, 2024
Viaarxiv icon

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

Add code
Aug 22, 2024
Figure 1 for Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Figure 2 for Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Figure 3 for Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Figure 4 for Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Viaarxiv icon

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

Add code
Aug 12, 2024
Viaarxiv icon

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Add code
Jul 11, 2024
Figure 1 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 2 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 3 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Figure 4 for MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Viaarxiv icon