Picture for Hisham Cholakkal

Hisham Cholakkal

equal contribution

Paying More Attention to Visual Tokens in Self-Evolving Large Multimodal Models

Add code
Jun 25, 2026
Viaarxiv icon

Ask, Solve, Generate: Self-Evolving Unified Multimodal Understanding and Generation via Self-Consistency Rewards

Add code
Jun 25, 2026
Viaarxiv icon

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Add code
May 18, 2026
Viaarxiv icon

Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Add code
Apr 07, 2026
Viaarxiv icon

CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning

Add code
Apr 03, 2026
Viaarxiv icon

MediX-R1: Open Ended Medical Reinforcement Learning

Add code
Feb 26, 2026
Viaarxiv icon

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Add code
Feb 24, 2026
Viaarxiv icon

Audit After Segmentation: Reference-Free Mask Quality Assessment for Language-Referred Audio-Visual Segmentation

Add code
Feb 03, 2026
Viaarxiv icon

A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos

Add code
Dec 18, 2025
Viaarxiv icon

How Good are Foundation Models in Step-by-Step Embodied Reasoning?

Add code
Sep 18, 2025
Figure 1 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 2 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 3 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Figure 4 for How Good are Foundation Models in Step-by-Step Embodied Reasoning?
Viaarxiv icon