Picture for Paul Pu Liang

Paul Pu Liang

May

VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues

Add code
Feb 17, 2025
Viaarxiv icon

Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection

Add code
Feb 10, 2025
Viaarxiv icon

OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Add code
Oct 30, 2024
Figure 1 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 2 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 3 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Figure 4 for OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
Viaarxiv icon

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Add code
Oct 24, 2024
Figure 1 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 2 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 3 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 4 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Viaarxiv icon

Progressive Compositionality In Text-to-Image Generative Models

Add code
Oct 22, 2024
Viaarxiv icon

TeaserGen: Generating Teasers for Long Documentaries

Add code
Oct 08, 2024
Figure 1 for TeaserGen: Generating Teasers for Long Documentaries
Figure 2 for TeaserGen: Generating Teasers for Long Documentaries
Figure 3 for TeaserGen: Generating Teasers for Long Documentaries
Figure 4 for TeaserGen: Generating Teasers for Long Documentaries
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Figure 1 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 2 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 3 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 4 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Viaarxiv icon

HEMM: Holistic Evaluation of Multimodal Foundation Models

Add code
Jul 03, 2024
Figure 1 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 2 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 3 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Figure 4 for HEMM: Holistic Evaluation of Multimodal Foundation Models
Viaarxiv icon

Foundations of Multisensory Artificial Intelligence

Add code
Apr 29, 2024
Viaarxiv icon