Picture for Oncel Tuzel

Oncel Tuzel

RayRoPE: Projective Ray Positional Encoding for Multi-view Attention

Add code
Jan 21, 2026
Viaarxiv icon

AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding

Add code
Dec 18, 2025
Figure 1 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 2 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 3 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Figure 4 for AMUSE: Audio-Visual Benchmark and Alignment Framework for Agentic Multi-Speaker Understanding
Viaarxiv icon

Learning to Reason for Hallucination Span Detection

Add code
Oct 02, 2025
Figure 1 for Learning to Reason for Hallucination Span Detection
Figure 2 for Learning to Reason for Hallucination Span Detection
Figure 3 for Learning to Reason for Hallucination Span Detection
Figure 4 for Learning to Reason for Hallucination Span Detection
Viaarxiv icon

MobileCLIP2: Improving Multi-Modal Reinforced Training

Add code
Aug 28, 2025
Viaarxiv icon

Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting

Add code
May 30, 2025
Figure 1 for Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Figure 2 for Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Figure 3 for Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Figure 4 for Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Viaarxiv icon

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Add code
Apr 11, 2025
Figure 1 for FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Figure 2 for FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Figure 3 for FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Figure 4 for FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Viaarxiv icon

TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining

Add code
Apr 02, 2025
Viaarxiv icon

Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization

Add code
Feb 24, 2025
Viaarxiv icon

3D Shape Tokenization

Add code
Dec 24, 2024
Figure 1 for 3D Shape Tokenization
Figure 2 for 3D Shape Tokenization
Figure 3 for 3D Shape Tokenization
Figure 4 for 3D Shape Tokenization
Viaarxiv icon

FastVLM: Efficient Vision Encoding for Vision Language Models

Add code
Dec 17, 2024
Figure 1 for FastVLM: Efficient Vision Encoding for Vision Language Models
Figure 2 for FastVLM: Efficient Vision Encoding for Vision Language Models
Figure 3 for FastVLM: Efficient Vision Encoding for Vision Language Models
Figure 4 for FastVLM: Efficient Vision Encoding for Vision Language Models
Viaarxiv icon