Picture for Irfan Essa

Irfan Essa

Leveraging Procedural Knowledge and Task Hierarchies for Efficient Instructional Video Pre-training

Add code
Feb 24, 2025
Viaarxiv icon

MALT Diffusion: Memory-Augmented Latent Transformers for Any-Length Video Generation

Add code
Feb 18, 2025
Viaarxiv icon

Calibrated Multi-Preference Optimization for Aligning Diffusion Models

Add code
Feb 04, 2025
Viaarxiv icon

Learning Complex Non-Rigid Image Edits from Multimodal Conditioning

Add code
Dec 13, 2024
Viaarxiv icon

AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset

Add code
Nov 23, 2024
Viaarxiv icon

Exploring Efficient Foundational Multi-modal Models for Video Summarization

Add code
Oct 09, 2024
Viaarxiv icon

Mamba Fusion: Learning Actions Through Questioning

Add code
Sep 17, 2024
Figure 1 for Mamba Fusion: Learning Actions Through Questioning
Figure 2 for Mamba Fusion: Learning Actions Through Questioning
Figure 3 for Mamba Fusion: Learning Actions Through Questioning
Figure 4 for Mamba Fusion: Learning Actions Through Questioning
Viaarxiv icon

Limitations in Employing Natural Language Supervision for Sensor-Based Human Activity Recognition -- And Ways to Overcome Them

Add code
Aug 21, 2024
Viaarxiv icon

Cropper: Vision-Language Model for Image Cropping through In-Context Learning

Add code
Aug 14, 2024
Figure 1 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 2 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 3 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Figure 4 for Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Viaarxiv icon

CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers

Add code
May 21, 2024
Figure 1 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
Figure 2 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
Figure 3 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
Figure 4 for CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers
Viaarxiv icon