Picture for Serge Belongie

Serge Belongie

Cornell Tech

MMEarth-Bench: Global Model Adaptation via Multimodal Test-Time Training

Add code
Feb 06, 2026
Viaarxiv icon

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

Add code
Feb 06, 2026
Viaarxiv icon

Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning

Add code
Jan 28, 2026
Viaarxiv icon

SuperF: Neural Implicit Fields for Multi-Image Super-Resolution

Add code
Dec 09, 2025
Viaarxiv icon

OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Add code
Dec 08, 2025
Figure 1 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 2 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 3 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Figure 4 for OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Viaarxiv icon

Stitch: Training-Free Position Control in Multimodal Diffusion Transformers

Add code
Sep 30, 2025
Viaarxiv icon

RespoDiff: Dual-Module Bottleneck Transformation for Responsible & Faithful T2I Generation

Add code
Sep 18, 2025
Viaarxiv icon

Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy

Add code
Sep 16, 2025
Figure 1 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 2 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 3 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Figure 4 for Is Meta-Learning Out? Rethinking Unsupervised Few-Shot Classification with Limited Entropy
Viaarxiv icon

Cultural Evaluations of Vision-Language Models Have a Lot to Learn from Cultural Theory

Add code
May 28, 2025
Viaarxiv icon

RAVENEA: A Benchmark for Multimodal Retrieval-Augmented Visual Culture Understanding

Add code
May 20, 2025
Viaarxiv icon