Picture for Tiancheng Zhao

Tiancheng Zhao

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Add code
Nov 25, 2024
Figure 1 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 2 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 3 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 4 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Viaarxiv icon

Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG

Add code
Nov 12, 2024
Figure 1 for Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
Figure 2 for Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
Figure 3 for Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG
Viaarxiv icon

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Add code
Jul 06, 2024
Figure 1 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 2 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 3 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Figure 4 for OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding
Viaarxiv icon

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Add code
Jun 25, 2024
Viaarxiv icon

Preserving Knowledge in Large Language Model: A Model-Agnostic Self-Decompression Approach

Add code
Jun 17, 2024
Viaarxiv icon

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Add code
Jun 15, 2024
Viaarxiv icon

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Add code
Jun 06, 2024
Figure 1 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 2 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 3 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 4 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Viaarxiv icon

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Add code
Mar 11, 2024
Viaarxiv icon

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Add code
Dec 22, 2023
Figure 1 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 2 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 3 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 4 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Viaarxiv icon

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Add code
Oct 20, 2023
Figure 1 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 2 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 3 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 4 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Viaarxiv icon