Picture for Ye Fang

Ye Fang

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Add code
Jan 03, 2025
Figure 1 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 2 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 3 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 4 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Viaarxiv icon

Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

Add code
Apr 29, 2024
Figure 1 for Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials
Figure 2 for Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials
Figure 3 for Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials
Figure 4 for Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials
Viaarxiv icon

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

Add code
Dec 22, 2023
Viaarxiv icon

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Add code
Dec 13, 2023
Viaarxiv icon

GPT4Point: A Unified Framework for Point-Language Understanding and Generation

Add code
Dec 05, 2023
Viaarxiv icon