Picture for Raiymbek Akshulakov

Raiymbek Akshulakov

From Unimodal to Multimodal: Scaling up Projectors to Align Modalities

Add code
Sep 28, 2024
Figure 1 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 2 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 3 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Figure 4 for From Unimodal to Multimodal: Scaling up Projectors to Align Modalities
Viaarxiv icon

Do Vision and Language Encoders Represent the World Similarly?

Add code
Jan 10, 2024
Figure 1 for Do Vision and Language Encoders Represent the World Similarly?
Figure 2 for Do Vision and Language Encoders Represent the World Similarly?
Figure 3 for Do Vision and Language Encoders Represent the World Similarly?
Figure 4 for Do Vision and Language Encoders Represent the World Similarly?
Viaarxiv icon

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

Add code
Aug 17, 2023
Figure 1 for EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
Figure 2 for EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
Figure 3 for EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
Figure 4 for EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding
Viaarxiv icon