Picture for Chi Chen

Chi Chen

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

Add code
Mar 17, 2025
Viaarxiv icon

Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs

Add code
Mar 16, 2025
Viaarxiv icon

DNA Origami Nanostructures Observed in Transmission Electron Microscopy Images can be Characterized through Convolutional Neural Networks

Add code
Mar 13, 2025
Viaarxiv icon

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Add code
Mar 13, 2025
Viaarxiv icon

Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models

Add code
Jan 13, 2025
Figure 1 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 2 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 3 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Figure 4 for Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
Viaarxiv icon

ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation

Add code
Jan 11, 2025
Figure 1 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 2 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 3 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Figure 4 for ChartCoder: Advancing Multimodal Large Language Model for Chart-to-Code Generation
Viaarxiv icon

Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency

Add code
Jan 09, 2025
Figure 1 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 2 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 3 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Figure 4 for Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Viaarxiv icon

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

Add code
Dec 18, 2024
Viaarxiv icon

StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding

Add code
Nov 06, 2024
Figure 1 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 2 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 3 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Figure 4 for StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding
Viaarxiv icon

PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model

Add code
Oct 21, 2024
Viaarxiv icon