Picture for Yuanxin Liu

Yuanxin Liu

Temporal Reasoning Transfer from Text to Video

Add code
Oct 08, 2024
Figure 1 for Temporal Reasoning Transfer from Text to Video
Figure 2 for Temporal Reasoning Transfer from Text to Video
Figure 3 for Temporal Reasoning Transfer from Text to Video
Figure 4 for Temporal Reasoning Transfer from Text to Video
Viaarxiv icon

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Add code
May 31, 2024
Figure 1 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 2 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 3 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 4 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Viaarxiv icon

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Add code
Mar 28, 2024
Viaarxiv icon

TempCompass: Do Video LLMs Really Understand Videos?

Add code
Mar 01, 2024
Figure 1 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 2 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 3 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 4 for TempCompass: Do Video LLMs Really Understand Videos?
Viaarxiv icon

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Add code
Nov 29, 2023
Figure 1 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 2 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 3 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 4 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Viaarxiv icon

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

Add code
Nov 08, 2023
Figure 1 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 2 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 3 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 4 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Viaarxiv icon

COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models

Add code
Oct 27, 2022
Viaarxiv icon

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Add code
Oct 26, 2022
Viaarxiv icon

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Add code
Oct 11, 2022
Figure 1 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 2 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 3 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 4 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Viaarxiv icon

Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA

Add code
Oct 10, 2022
Figure 1 for Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
Figure 2 for Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
Figure 3 for Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
Figure 4 for Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA
Viaarxiv icon