Picture for Leonid Sigal

Leonid Sigal

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

Add code
Dec 24, 2024
Viaarxiv icon

What Has Been Overlooked in Contrastive Source-Free Domain Adaptation: Leveraging Source-Informed Latent Augmentation within Neighborhood Context

Add code
Dec 18, 2024
Viaarxiv icon

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images

Add code
Dec 13, 2024
Viaarxiv icon

Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses

Add code
Dec 11, 2024
Viaarxiv icon

Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Add code
Dec 07, 2024
Viaarxiv icon

Four-Plane Factorized Video Autoencoders

Add code
Dec 05, 2024
Viaarxiv icon

Extending Video Masked Autoencoders to 128 frames

Add code
Nov 20, 2024
Figure 1 for Extending Video Masked Autoencoders to 128 frames
Figure 2 for Extending Video Masked Autoencoders to 128 frames
Figure 3 for Extending Video Masked Autoencoders to 128 frames
Figure 4 for Extending Video Masked Autoencoders to 128 frames
Viaarxiv icon

MM-R$^3$: On (In-)Consistency of Multi-modal Large Language Models (MLLMs)

Add code
Oct 07, 2024
Viaarxiv icon

Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities

Add code
Aug 13, 2024
Figure 1 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 2 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 3 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 4 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Viaarxiv icon

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Add code
Jul 19, 2024
Figure 1 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 2 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 3 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 4 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Viaarxiv icon