Picture for Yujie Zhong

Yujie Zhong

CharGen: High Accurate Character-Level Visual Text Generation Model with MultiModal Encoder

Add code
Dec 23, 2024
Viaarxiv icon

InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models

Add code
Dec 18, 2024
Viaarxiv icon

Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Add code
Dec 13, 2024
Viaarxiv icon

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

Add code
Dec 10, 2024
Viaarxiv icon

LinVT: Empower Your Image-level Large Language Model to Understand Videos

Add code
Dec 06, 2024
Viaarxiv icon

RFSR: Improving ISR Diffusion Models via Reward Feedback Learning

Add code
Dec 04, 2024
Viaarxiv icon

TASR: Timestep-Aware Diffusion Model for Image Super-Resolution

Add code
Dec 04, 2024
Viaarxiv icon

HyperSeg: Towards Universal Visual Segmentation with Large Language Model

Add code
Nov 26, 2024
Viaarxiv icon

360-Degree Video Super Resolution and Quality Enhancement Challenge: Methods and Results

Add code
Nov 11, 2024
Viaarxiv icon

Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation

Add code
Oct 17, 2024
Figure 1 for Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Figure 2 for Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Figure 3 for Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Figure 4 for Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Viaarxiv icon