Picture for Leonid Sigal

Leonid Sigal

Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images

Add code
Dec 13, 2024
Viaarxiv icon

Barking Up The Syntactic Tree: Enhancing VLM Training with Syntactic Losses

Add code
Dec 11, 2024
Viaarxiv icon

Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events

Add code
Dec 07, 2024
Viaarxiv icon

Four-Plane Factorized Video Autoencoders

Add code
Dec 05, 2024
Viaarxiv icon

Extending Video Masked Autoencoders to 128 frames

Add code
Nov 20, 2024
Viaarxiv icon

MM-R$^3$: On (In-)Consistency of Multi-modal Large Language Models (MLLMs)

Add code
Oct 07, 2024
Viaarxiv icon

Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities

Add code
Aug 13, 2024
Figure 1 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 2 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 3 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Figure 4 for Response Wide Shut: Surprising Observations in Basic Vision Language Model Capabilities
Viaarxiv icon

On Pre-training of Multimodal Language Models Customized for Chart Understanding

Add code
Jul 19, 2024
Figure 1 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 2 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 3 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Figure 4 for On Pre-training of Multimodal Language Models Customized for Chart Understanding
Viaarxiv icon

Representing Animatable Avatar via Factorized Neural Fields

Add code
Jun 02, 2024
Viaarxiv icon

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Add code
Apr 17, 2024
Viaarxiv icon