Picture for Kangjia Zhao

Kangjia Zhao

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Add code
Nov 25, 2024
Figure 1 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 2 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 3 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 4 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Viaarxiv icon

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Add code
Jun 06, 2024
Figure 1 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 2 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 3 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Figure 4 for HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation
Viaarxiv icon

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Add code
Oct 20, 2023
Figure 1 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 2 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 3 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 4 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Viaarxiv icon