Picture for Kaiwen Zhou

Kaiwen Zhou

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Add code
Jan 27, 2025
Viaarxiv icon

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration

Add code
Dec 01, 2024
Viaarxiv icon

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Add code
Oct 19, 2024
Figure 1 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 2 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 3 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 4 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Figure 1 for Multimodal Situational Safety
Figure 2 for Multimodal Situational Safety
Figure 3 for Multimodal Situational Safety
Figure 4 for Multimodal Situational Safety
Viaarxiv icon

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Add code
Jul 25, 2024
Figure 1 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 2 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 3 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 4 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Viaarxiv icon

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

Add code
Jan 16, 2024
Viaarxiv icon

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Add code
Nov 30, 2023
Viaarxiv icon

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

Add code
Oct 29, 2023
Viaarxiv icon

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Add code
Oct 09, 2023
Viaarxiv icon