Picture for Qing-Guo Chen

Qing-Guo Chen

Evaluating Image Caption via Cycle-consistent Text-to-Image Generation

Add code
Jan 08, 2025
Figure 1 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 2 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 3 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Figure 4 for Evaluating Image Caption via Cycle-consistent Text-to-Image Generation
Viaarxiv icon

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs

Add code
Jan 06, 2025
Viaarxiv icon

UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation

Add code
Dec 25, 2024
Viaarxiv icon

OmniEvalKit: A Modular, Lightweight Toolbox for Evaluating Large Language Model and its Omni-Extensions

Add code
Dec 09, 2024
Viaarxiv icon

PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm

Add code
Dec 05, 2024
Figure 1 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 2 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 3 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Figure 4 for PEMF-VVTO: Point-Enhanced Video Virtual Try-on via Mask-free Paradigm
Viaarxiv icon

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Add code
Oct 10, 2024
Figure 1 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 2 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 3 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Figure 4 for Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Viaarxiv icon

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Add code
Jun 11, 2024
Viaarxiv icon

Wings: Learning Multimodal LLMs without Text-only Forgetting

Add code
Jun 05, 2024
Figure 1 for Wings: Learning Multimodal LLMs without Text-only Forgetting
Figure 2 for Wings: Learning Multimodal LLMs without Text-only Forgetting
Figure 3 for Wings: Learning Multimodal LLMs without Text-only Forgetting
Figure 4 for Wings: Learning Multimodal LLMs without Text-only Forgetting
Viaarxiv icon

Parrot: Multilingual Visual Instruction Tuning

Add code
Jun 04, 2024
Figure 1 for Parrot: Multilingual Visual Instruction Tuning
Figure 2 for Parrot: Multilingual Visual Instruction Tuning
Figure 3 for Parrot: Multilingual Visual Instruction Tuning
Figure 4 for Parrot: Multilingual Visual Instruction Tuning
Viaarxiv icon

Ovis: Structural Embedding Alignment for Multimodal Large Language Model

Add code
May 31, 2024
Figure 1 for Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Figure 2 for Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Figure 3 for Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Figure 4 for Ovis: Structural Embedding Alignment for Multimodal Large Language Model
Viaarxiv icon