Video Summarization


Video summarization is the process of creating a concise representation of a video that contains the most important information.

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding

Add code
Jan 29, 2026
Viaarxiv icon

Semantic-Guided Unsupervised Video Summarization

Add code
Jan 21, 2026
Viaarxiv icon

Where is the multimodal goal post? On the Ability of Foundation Models to Recognize Contextually Important Moments

Add code
Jan 22, 2026
Viaarxiv icon

Less is More: Label-Guided Summarization of Procedural and Instructional Videos

Add code
Jan 18, 2026
Viaarxiv icon

LLMTrack: Semantic Multi-Object Tracking with Multi-modal Large Language Models

Add code
Jan 10, 2026
Viaarxiv icon

From Understanding to Engagement: Personalized pharmacy Video Clips via Vision Language Models (VLMs)

Add code
Jan 08, 2026
Viaarxiv icon

Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning

Add code
Jan 05, 2026
Viaarxiv icon

MovieRecapsQA: A Multimodal Open-Ended Video Question-Answering Benchmark

Add code
Jan 05, 2026
Viaarxiv icon

Analyzing Reasoning Consistency in Large Multimodal Models under Cross-Modal Conflicts

Add code
Jan 07, 2026
Viaarxiv icon

An Architecture-Led Hybrid Report on Body Language Detection Project

Add code
Dec 28, 2025
Viaarxiv icon