Picture for Bang Yang

Bang Yang

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

Add code
Mar 22, 2024
Viaarxiv icon

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework

Add code
Mar 14, 2024
Viaarxiv icon

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs

Add code
Mar 10, 2024
Viaarxiv icon

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

Add code
Jan 30, 2024
Viaarxiv icon

Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models

Add code
Dec 07, 2023
Viaarxiv icon

UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework

Add code
Nov 16, 2023
Viaarxiv icon

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

Add code
Aug 25, 2023
Viaarxiv icon

Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels

Add code
Jul 05, 2023
Figure 1 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 2 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 3 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Figure 4 for Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Viaarxiv icon

Customizing General-Purpose Foundation Models for Medical Report Generation

Add code
Jun 09, 2023
Viaarxiv icon

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

Add code
Apr 05, 2023
Viaarxiv icon