Picture for Yujie Zhong

Yujie Zhong

HiMix: Reducing Computational Complexity in Large Vision-Language Models

Add code
Jan 17, 2025
Viaarxiv icon

Manga Generation via Layout-controllable Diffusion

Add code
Dec 26, 2024
Viaarxiv icon

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder

Add code
Dec 23, 2024
Viaarxiv icon

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Add code
Dec 13, 2024
Viaarxiv icon

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Add code
Dec 10, 2024
Figure 1 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 2 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 3 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Figure 4 for DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Viaarxiv icon

LinVT: Empower Your Image-level Large Language Model to Understand Videos

Add code
Dec 06, 2024
Viaarxiv icon

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

Add code
Dec 04, 2024
Viaarxiv icon

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

Add code
Dec 04, 2024
Viaarxiv icon

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Add code
Nov 26, 2024
Viaarxiv icon