Picture for Lorenzo Torresani

Lorenzo Torresani

VITED: Video Temporal Evidence Distillation

Add code
Mar 17, 2025
Viaarxiv icon

BIMBA: Selective-Scan Compression for Long-Range Video Question Answering

Add code
Mar 13, 2025
Viaarxiv icon

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Figure 1 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 2 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 3 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Figure 4 for TimeRefine: Temporal Grounding with Time Refining Video LLM
Viaarxiv icon

Semantic Compositions Enhance Vision-Language Contrastive Learning

Add code
Jul 01, 2024
Figure 1 for Semantic Compositions Enhance Vision-Language Contrastive Learning
Figure 2 for Semantic Compositions Enhance Vision-Language Contrastive Learning
Figure 3 for Semantic Compositions Enhance Vision-Language Contrastive Learning
Figure 4 for Semantic Compositions Enhance Vision-Language Contrastive Learning
Viaarxiv icon

Step Differences in Instructional Video

Add code
Apr 24, 2024
Figure 1 for Step Differences in Instructional Video
Figure 2 for Step Differences in Instructional Video
Figure 3 for Step Differences in Instructional Video
Figure 4 for Step Differences in Instructional Video
Viaarxiv icon

Video ReCap: Recursive Captioning of Hour-Long Videos

Add code
Feb 28, 2024
Figure 1 for Video ReCap: Recursive Captioning of Hour-Long Videos
Figure 2 for Video ReCap: Recursive Captioning of Hour-Long Videos
Figure 3 for Video ReCap: Recursive Captioning of Hour-Long Videos
Figure 4 for Video ReCap: Recursive Captioning of Hour-Long Videos
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Multiscale Video Pretraining for Long-Term Activity Forecasting

Add code
Jul 24, 2023
Viaarxiv icon

Learning to Ground Instructional Articles in Videos through Narrations

Add code
Jun 06, 2023
Viaarxiv icon

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Add code
Mar 09, 2023
Viaarxiv icon