Picture for Tanvir Mahmud

Tanvir Mahmud

OpenSep: Leveraging Large Language Models with Textual Inversion for Open World Audio Separation

Add code
Sep 28, 2024
Viaarxiv icon

Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior

Add code
Jun 07, 2024
Figure 1 for Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
Figure 2 for Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
Figure 3 for Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
Figure 4 for Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior
Viaarxiv icon

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

Add code
Jun 07, 2024
Viaarxiv icon

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity

Add code
Apr 02, 2024
Viaarxiv icon

T-VSL: Text-Guided Visual Sound Source Localization in Mixtures

Add code
Apr 02, 2024
Viaarxiv icon

PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference

Add code
Mar 24, 2024
Figure 1 for PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Figure 2 for PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Figure 3 for PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Figure 4 for PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Viaarxiv icon

SSVOD: Semi-Supervised Video Object Detection with Sparse Annotations

Add code
Sep 04, 2023
Viaarxiv icon

Instance-Aware Repeat Factor Sampling for Long-Tailed Object Detection

Add code
May 14, 2023
Viaarxiv icon

CIFF-Net: Contextual Image Feature Fusion for Melanoma Diagnosis

Add code
Mar 07, 2023
Viaarxiv icon

AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization

Add code
Oct 11, 2022
Figure 1 for AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Figure 2 for AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Figure 3 for AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Figure 4 for AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization
Viaarxiv icon