Picture for Irfan Essa

Irfan Essa

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning

Add code
Dec 13, 2024
Viaarxiv icon

AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset

Add code
Nov 23, 2024
Viaarxiv icon

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Add code
Oct 09, 2024
Viaarxiv icon

Mamba Fusion: Learning Actions Through Questioning

Add code
Sep 17, 2024
Viaarxiv icon

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them

Add code
Aug 21, 2024
Viaarxiv icon

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Add code
Aug 14, 2024
Viaarxiv icon

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Add code
May 21, 2024
Viaarxiv icon

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

Add code
Apr 17, 2024
Viaarxiv icon

3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D

Add code
Mar 19, 2024
Viaarxiv icon

On the Efficacy of Text-Based Input Modalities for Action Anticipation

Add code
Jan 23, 2024
Viaarxiv icon