Picture for Pan Zhang

Pan Zhang

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Add code
Jan 09, 2025
Viaarxiv icon

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Add code
Jan 06, 2025
Viaarxiv icon

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Add code
Jan 06, 2025
Figure 1 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 2 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 3 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 4 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon

ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification

Add code
Dec 03, 2024
Figure 1 for ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
Figure 2 for ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
Figure 3 for ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
Figure 4 for ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification
Viaarxiv icon

X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models

Add code
Dec 02, 2024
Figure 1 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Figure 2 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Figure 3 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Figure 4 for X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models
Viaarxiv icon

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Add code
Nov 09, 2024
Viaarxiv icon

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Add code
Oct 23, 2024
Figure 1 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 2 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 3 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Figure 4 for MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Viaarxiv icon

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Add code
Oct 22, 2024
Viaarxiv icon

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Add code
Oct 21, 2024
Viaarxiv icon