Picture for Zhiqi Ge

Zhiqi Ge

Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining

Add code
Dec 13, 2024
Viaarxiv icon

Unified Generative and Discriminative Training for Multi-modal Large Language Models

Add code
Nov 01, 2024
Figure 1 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 2 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 3 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Figure 4 for Unified Generative and Discriminative Training for Multi-modal Large Language Models
Viaarxiv icon

WorldGPT: Empowering LLM as Multimodal World Model

Add code
Apr 28, 2024
Viaarxiv icon

Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions

Add code
Aug 10, 2023
Viaarxiv icon