Picture for Malvina Nikandrou

Malvina Nikandrou

Retrievit: In-context Retrieval Capabilities of Transformers, State Space Models, and Hybrid Architectures

Add code
Mar 03, 2026
Viaarxiv icon

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users

Add code
Mar 28, 2025
Viaarxiv icon

CROPE: Evaluating In-Context Adaptation of Vision and Language Models to Culture-Specific Concepts

Add code
Oct 20, 2024
Viaarxiv icon

Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling

Add code
Sep 09, 2024
Figure 1 for Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Figure 2 for Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Figure 3 for Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Figure 4 for Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Viaarxiv icon

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

Add code
Jun 27, 2024
Figure 1 for Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Figure 2 for Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Figure 3 for Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Figure 4 for Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation
Viaarxiv icon

Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks

Add code
May 07, 2024
Figure 1 for Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
Figure 2 for Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
Figure 3 for Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
Figure 4 for Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks
Viaarxiv icon

Multitask Multimodal Prompted Training for Interactive Embodied Task Completion

Add code
Nov 07, 2023
Figure 1 for Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Figure 2 for Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Figure 3 for Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Figure 4 for Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Viaarxiv icon

Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment

Add code
May 01, 2023
Figure 1 for Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment
Figure 2 for Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment
Figure 3 for Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment
Figure 4 for Quality-agnostic Image Captioning to Safely Assist People with Vision Impairment
Viaarxiv icon

Going for GOAL: A Resource for Grounded Football Commentaries

Add code
Nov 08, 2022
Figure 1 for Going for GOAL: A Resource for Grounded Football Commentaries
Figure 2 for Going for GOAL: A Resource for Grounded Football Commentaries
Figure 3 for Going for GOAL: A Resource for Grounded Football Commentaries
Figure 4 for Going for GOAL: A Resource for Grounded Football Commentaries
Viaarxiv icon