Picture for Mingwei Zhu

Mingwei Zhu

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration

Add code
Nov 25, 2024
Figure 1 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 2 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 3 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Figure 4 for ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration
Viaarxiv icon

The intelligent prediction and assessment of financial information risk in the cloud computing model

Add code
Apr 14, 2024
Figure 1 for The intelligent prediction and assessment of financial information risk in the cloud computing model
Figure 2 for The intelligent prediction and assessment of financial information risk in the cloud computing model
Figure 3 for The intelligent prediction and assessment of financial information risk in the cloud computing model
Figure 4 for The intelligent prediction and assessment of financial information risk in the cloud computing model
Viaarxiv icon

Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning

Add code
Mar 28, 2024
Viaarxiv icon

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Add code
Dec 22, 2023
Figure 1 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 2 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 3 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Figure 4 for GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Viaarxiv icon

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Add code
Oct 20, 2023
Figure 1 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 2 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 3 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Figure 4 for Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models
Viaarxiv icon

VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations

Add code
Jul 01, 2022
Figure 1 for VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Figure 2 for VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Figure 3 for VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Figure 4 for VL-CheckList: Evaluating Pre-trained Vision-Language Models with Objects, Attributes and Relations
Viaarxiv icon