Picture for Md Mohaiminul Islam

Md Mohaiminul Islam

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Viaarxiv icon

An Extensive Study on D2C: Overfitting Remediation in Deep Learning Using a Decentralized Approach

Add code
Nov 24, 2024
Viaarxiv icon

ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos

Add code
Nov 22, 2024
Figure 1 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 2 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 3 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Figure 4 for ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Viaarxiv icon

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Add code
Sep 30, 2024
Figure 1 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 2 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 3 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 4 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Viaarxiv icon

Video ReCap: Recursive Captioning of Hour-Long Videos

Add code
Feb 28, 2024
Viaarxiv icon

A Simple LLM Framework for Long-Range Video Question-Answering

Add code
Dec 28, 2023
Viaarxiv icon

RGNet: A Unified Retrieval and Grounding Network for Long Videos

Add code
Dec 11, 2023
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Efficient Movie Scene Detection using State-Space Transformers

Add code
Dec 29, 2022
Viaarxiv icon

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

Add code
Jul 24, 2022
Figure 1 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 2 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 3 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Figure 4 for Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism
Viaarxiv icon