Picture for Sanath Narayan

Sanath Narayan

From Unimodal to Multimodal: Scaling up Projectors to Align Modalities

Add code
Sep 28, 2024
Figure 1 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 2 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 3 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 4 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Viaarxiv icon

Falcon2-11B Technical Report

Add code
Jul 20, 2024
Viaarxiv icon

Open-Vocabulary Temporal Action Localization using Multimodal Guidance

Add code
Jun 21, 2024
Viaarxiv icon

Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning

Add code
Jun 06, 2024
Viaarxiv icon

Multi-modal Generation via Cross-Modal In-Context Learning

Add code
May 28, 2024
Viaarxiv icon

ViSpeR: Multilingual Audio-Visual Speech Recognition

Add code
May 27, 2024
Viaarxiv icon

Do Vision and Language Encoders Represent the World Similarly?

Add code
Jan 10, 2024
Figure 1 for Do Vision and Language Encoders Represent the World Similarly?
Figure 2 for Do Vision and Language Encoders Represent the World Similarly?
Figure 3 for Do Vision and Language Encoders Represent the World Similarly?
Figure 4 for Do Vision and Language Encoders Represent the World Similarly?
Viaarxiv icon

Do VSR Models Generalize Beyond LRS3?

Add code
Nov 23, 2023
Viaarxiv icon

Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping

Add code
Aug 11, 2023
Viaarxiv icon

Remote Sensing Change Detection With Transformers Trained from Scratch

Add code
Apr 13, 2023
Viaarxiv icon