Picture for Kaiwen Zhou

Kaiwen Zhou

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluation

Add code
Oct 19, 2024
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Viaarxiv icon

RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models

Add code
Jul 25, 2024
Viaarxiv icon

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon

Enhancing Evolving Domain Generalization through Dynamic Latent Representations

Add code
Jan 16, 2024
Viaarxiv icon

Positional Information Matters for Invariant In-Context Learning: A Case Study of Simple Function Classes

Add code
Nov 30, 2023
Viaarxiv icon

Does Invariant Graph Learning via Environment Augmentation Learn Invariance?

Add code
Oct 29, 2023
Viaarxiv icon

ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language Models

Add code
Oct 09, 2023
Viaarxiv icon

Towards Understanding Feature Learning in Out-of-Distribution Generalization

Add code
Apr 22, 2023
Viaarxiv icon

ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation

Add code
Jan 30, 2023
Figure 1 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 2 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 3 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Figure 4 for ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation
Viaarxiv icon