Picture for Kaiwen Zhou

Kaiwen Zhou

Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration

Add code
Dec 01, 2024
Viaarxiv icon

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Add code
Oct 19, 2024
Figure 1 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 2 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 3 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Figure 4 for SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Viaarxiv icon

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Add code
Jul 25, 2024
Figure 1 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 2 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 3 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Figure 4 for RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
Viaarxiv icon

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

Add code
Jan 16, 2024
Viaarxiv icon

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Add code
Nov 30, 2023
Viaarxiv icon

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

Add code
Oct 29, 2023
Viaarxiv icon

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Add code
Oct 09, 2023
Viaarxiv icon

Towards Understanding Feature Learning in Out-of-Distribution Generalization

Add code
Apr 22, 2023
Viaarxiv icon