Picture for Yuanxin Liu

Yuanxin Liu

PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension

Add code
Dec 16, 2024
Figure 1 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 2 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 3 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Figure 4 for PunchBench: Benchmarking MLLMs in Multimodal Punchline Comprehension
Viaarxiv icon

Temporal Reasoning Transfer from Text to Video

Add code
Oct 08, 2024
Figure 1 for Temporal Reasoning Transfer from Text to Video
Figure 2 for Temporal Reasoning Transfer from Text to Video
Figure 3 for Temporal Reasoning Transfer from Text to Video
Figure 4 for Temporal Reasoning Transfer from Text to Video
Viaarxiv icon

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Add code
May 31, 2024
Figure 1 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 2 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 3 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Figure 4 for DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Viaarxiv icon

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Add code
Mar 28, 2024
Figure 1 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 2 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 3 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Figure 4 for Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality
Viaarxiv icon

TempCompass: Do Video LLMs Really Understand Videos?

Add code
Mar 01, 2024
Figure 1 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 2 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 3 for TempCompass: Do Video LLMs Really Understand Videos?
Figure 4 for TempCompass: Do Video LLMs Really Understand Videos?
Viaarxiv icon

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Add code
Nov 29, 2023
Figure 1 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 2 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 3 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Figure 4 for VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Viaarxiv icon

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

Add code
Nov 08, 2023
Figure 1 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 2 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 3 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Figure 4 for FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation
Viaarxiv icon

COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models

Add code
Oct 27, 2022
Viaarxiv icon

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Add code
Oct 26, 2022
Viaarxiv icon

A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

Add code
Oct 11, 2022
Figure 1 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 2 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 3 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Figure 4 for A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models
Viaarxiv icon