Picture for Irfan Essa

Irfan Essa

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Add code
Oct 09, 2024
Viaarxiv icon

Mamba Fusion: Learning Actions Through Questioning

Add code
Sep 17, 2024
Viaarxiv icon

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them

Add code
Aug 21, 2024
Viaarxiv icon

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Add code
Aug 14, 2024
Viaarxiv icon

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Add code
May 21, 2024
Viaarxiv icon

SLAIM: Robust Dense Neural SLAM for Online Tracking and Mapping

Add code
Apr 17, 2024
Viaarxiv icon

3D Semantic MapNet: Building Maps for Multi-Object Re-Identification in 3D

Add code
Mar 19, 2024
Viaarxiv icon

On the Efficacy of Text-Based Input Modalities for Action Anticipation

Add code
Jan 23, 2024
Viaarxiv icon

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Add code
Jan 11, 2024
Viaarxiv icon

VideoPoet: A Large Language Model for Zero-Shot Video Generation

Add code
Dec 21, 2023
Viaarxiv icon