Picture for Mattia Soldan

Mattia Soldan

ResidualViT for Efficient Temporally Dense Video Encoding

Add code
Sep 16, 2025
Viaarxiv icon

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Add code
Feb 27, 2025
Viaarxiv icon

Compressed-Language Models for Understanding Compressed File Formats: a JPEG Exploration

Add code
May 27, 2024
Viaarxiv icon

Towards Automated Movie Trailer Generation

Add code
Apr 04, 2024
Figure 1 for Towards Automated Movie Trailer Generation
Figure 2 for Towards Automated Movie Trailer Generation
Figure 3 for Towards Automated Movie Trailer Generation
Figure 4 for Towards Automated Movie Trailer Generation
Viaarxiv icon

Boundary-Denoising for Video Activity Localization

Add code
Apr 06, 2023
Figure 1 for Boundary-Denoising for Video Activity Localization
Figure 2 for Boundary-Denoising for Video Activity Localization
Figure 3 for Boundary-Denoising for Video Activity Localization
Figure 4 for Boundary-Denoising for Video Activity Localization
Viaarxiv icon

Localizing Moments in Long Video Via Multimodal Guidance

Add code
Feb 26, 2023
Figure 1 for Localizing Moments in Long Video Via Multimodal Guidance
Figure 2 for Localizing Moments in Long Video Via Multimodal Guidance
Figure 3 for Localizing Moments in Long Video Via Multimodal Guidance
Figure 4 for Localizing Moments in Long Video Via Multimodal Guidance
Viaarxiv icon

Egocentric Video-Language Pretraining @ Ego4D Challenge 2022

Add code
Jul 04, 2022
Figure 1 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 2 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 3 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Figure 4 for Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Viaarxiv icon

Egocentric Video-Language Pretraining

Add code
Jun 03, 2022
Figure 1 for Egocentric Video-Language Pretraining
Figure 2 for Egocentric Video-Language Pretraining
Figure 3 for Egocentric Video-Language Pretraining
Figure 4 for Egocentric Video-Language Pretraining
Viaarxiv icon

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

Add code
Dec 01, 2021
Figure 1 for MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Figure 2 for MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Figure 3 for MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Figure 4 for MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Viaarxiv icon

VLG-Net: Video-Language Graph Matching Network for Video Grounding

Add code
Nov 19, 2020
Figure 1 for VLG-Net: Video-Language Graph Matching Network for Video Grounding
Figure 2 for VLG-Net: Video-Language Graph Matching Network for Video Grounding
Figure 3 for VLG-Net: Video-Language Graph Matching Network for Video Grounding
Figure 4 for VLG-Net: Video-Language Graph Matching Network for Video Grounding
Viaarxiv icon