Picture for Tanzila Rahman

Tanzila Rahman

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Add code
Dec 24, 2024
Viaarxiv icon

Visual Concept-driven Image Generation with Text-to-Image Diffusion Model

Add code
Feb 18, 2024
Figure 1 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 2 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 3 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Figure 4 for Visual Concept-driven Image Generation with Text-to-Image Diffusion Model
Viaarxiv icon

Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models

Add code
Dec 19, 2023
Viaarxiv icon

Make-A-Story: Visual Memory Conditioned Consistent Story Generation

Add code
Nov 23, 2022
Viaarxiv icon

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

Add code
Oct 26, 2021
Figure 1 for TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Figure 2 for TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Figure 3 for TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Figure 4 for TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Viaarxiv icon

Weakly-supervised Audio-visual Sound Source Detection and Separation

Add code
Mar 25, 2021
Figure 1 for Weakly-supervised Audio-visual Sound Source Detection and Separation
Figure 2 for Weakly-supervised Audio-visual Sound Source Detection and Separation
Figure 3 for Weakly-supervised Audio-visual Sound Source Detection and Separation
Figure 4 for Weakly-supervised Audio-visual Sound Source Detection and Separation
Viaarxiv icon

An Improved Attention for Visual Question Answering

Add code
Nov 07, 2020
Figure 1 for An Improved Attention for Visual Question Answering
Figure 2 for An Improved Attention for Visual Question Answering
Figure 3 for An Improved Attention for Visual Question Answering
Figure 4 for An Improved Attention for Visual Question Answering
Viaarxiv icon

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Add code
Oct 25, 2019
Figure 1 for Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Figure 2 for Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Figure 3 for Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Figure 4 for Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning
Viaarxiv icon

Convolutional Temporal Attention Model for Video-based Person Re-identification

Add code
Apr 10, 2019
Figure 1 for Convolutional Temporal Attention Model for Video-based Person Re-identification
Figure 2 for Convolutional Temporal Attention Model for Video-based Person Re-identification
Figure 3 for Convolutional Temporal Attention Model for Video-based Person Re-identification
Figure 4 for Convolutional Temporal Attention Model for Video-based Person Re-identification
Viaarxiv icon

Video-based Person Re-identification Using Spatial-Temporal Attention Networks

Add code
Oct 26, 2018
Figure 1 for Video-based Person Re-identification Using Spatial-Temporal Attention Networks
Figure 2 for Video-based Person Re-identification Using Spatial-Temporal Attention Networks
Figure 3 for Video-based Person Re-identification Using Spatial-Temporal Attention Networks
Figure 4 for Video-based Person Re-identification Using Spatial-Temporal Attention Networks
Viaarxiv icon