Picture for Sergio Escalera

Sergio Escalera

UB

SOVABench: A Vehicle Surveillance Action Retrieval Benchmark for Multimodal Large Language Models

Add code
Jan 08, 2026
Viaarxiv icon

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

Add code
Jan 07, 2026
Viaarxiv icon

Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models

Add code
Dec 22, 2025
Figure 1 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 2 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 3 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Figure 4 for Interact2Ar: Full-Body Human-Human Interaction Generation via Autoregressive Diffusion Models
Viaarxiv icon

RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation

Add code
Dec 08, 2025
Viaarxiv icon

SoccerNet 2025 Challenges Results

Add code
Aug 26, 2025
Viaarxiv icon

COOkeD: Ensemble-based OOD detection in the era of zero-shot CLIP

Add code
Jul 30, 2025
Viaarxiv icon

Sparse-Dense Side-Tuner for efficient Video Temporal Grounding

Add code
Jul 10, 2025
Viaarxiv icon

REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge

Add code
May 22, 2025
Figure 1 for REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Figure 2 for REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Figure 3 for REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Figure 4 for REACT 2025: the Third Multiple Appropriate Facial Reaction Generation Challenge
Viaarxiv icon

L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers

Add code
May 12, 2025
Viaarxiv icon

Action Anticipation from SoccerNet Football Video Broadcasts

Add code
Apr 16, 2025
Viaarxiv icon