Picture for Paul Pu Liang

Paul Pu Liang

Shammie

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Add code
Oct 30, 2024
Viaarxiv icon

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Add code
Oct 24, 2024
Figure 1 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 2 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 3 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 4 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Viaarxiv icon

Progressive Compositionality In Text-to-Image Generative Models

Add code
Oct 22, 2024
Viaarxiv icon

TeaserGen: Generating Teasers for Long Documentaries

Add code
Oct 08, 2024
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Viaarxiv icon

HEMM: Holistic Evaluation of Multimodal Foundation Models

Add code
Jul 03, 2024
Figure 1 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 2 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 3 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 4 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Viaarxiv icon

Foundations of Multisensory Artificial Intelligence

Add code
Apr 29, 2024
Viaarxiv icon

Semantically Corrected Amharic Automatic Speech Recognition

Add code
Apr 20, 2024
Viaarxiv icon

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

Add code
Apr 17, 2024
Viaarxiv icon