Picture for Zonghao Guo

Zonghao Guo

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Add code
Mar 17, 2025
Viaarxiv icon

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

Add code
Mar 16, 2025
Viaarxiv icon

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

Add code
Jan 13, 2025
Figure 1 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 2 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 3 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 4 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Viaarxiv icon

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Add code
Dec 18, 2024
Viaarxiv icon

DPVS-Shapley:Faster and Universal Contribution Evaluation Component in Federated Learning

Add code
Oct 19, 2024
Figure 1 for DPVS-Shapley:Faster and Universal Contribution Evaluation Component in Federated Learning
Figure 2 for DPVS-Shapley:Faster and Universal Contribution Evaluation Component in Federated Learning
Figure 3 for DPVS-Shapley:Faster and Universal Contribution Evaluation Component in Federated Learning
Figure 4 for DPVS-Shapley:Faster and Universal Contribution Evaluation Component in Federated Learning
Viaarxiv icon

Scaling Efficient Masked Autoencoder Learning on Large Remote Sensing Dataset

Add code
Jun 17, 2024
Viaarxiv icon

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Add code
Mar 18, 2024
Figure 1 for LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Figure 2 for LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Figure 3 for LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Figure 4 for LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
Viaarxiv icon

Controllable Dense Captioner with Multimodal Embedding Bridging

Add code
Feb 01, 2024
Viaarxiv icon

Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes

Add code
Aug 17, 2022
Figure 1 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 2 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 3 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Figure 4 for Bidirectional Feature Globalization for Few-shot Semantic Segmentation of 3D Point Cloud Scenes
Viaarxiv icon

Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Add code
May 19, 2022
Figure 1 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 2 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 3 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Figure 4 for Integral Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection
Viaarxiv icon