Picture for Junqi Ge

Junqi Ge

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Add code
Dec 12, 2024
Figure 1 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 2 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 3 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Figure 4 for V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Viaarxiv icon

Parameter-Inverted Image Pyramid Networks

Add code
Jun 06, 2024
Viaarxiv icon