Picture for Ryo Hachiuma

Ryo Hachiuma

Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Add code
Dec 23, 2025
Viaarxiv icon

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Add code
Dec 22, 2025
Viaarxiv icon

Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in

Add code
Dec 16, 2025
Viaarxiv icon

Unified Reinforcement and Imitation Learning for Vision-Language Models

Add code
Oct 22, 2025
Figure 1 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 2 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 3 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Figure 4 for Unified Reinforcement and Imitation Learning for Vision-Language Models
Viaarxiv icon

Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation

Add code
Sep 09, 2025
Figure 1 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 2 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 3 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Figure 4 for Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Viaarxiv icon

Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation

Add code
Sep 03, 2025
Figure 1 for Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Figure 2 for Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Figure 3 for Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Figure 4 for Human Preference-Aligned Concept Customization Benchmark via Decomposed Evaluation
Viaarxiv icon

Autoregressive Universal Video Segmentation Model

Add code
Aug 26, 2025
Figure 1 for Autoregressive Universal Video Segmentation Model
Figure 2 for Autoregressive Universal Video Segmentation Model
Figure 3 for Autoregressive Universal Video Segmentation Model
Figure 4 for Autoregressive Universal Video Segmentation Model
Viaarxiv icon

GenRecal: Generation after Recalibration from Large to Small Vision-Language Models

Add code
Jun 18, 2025
Viaarxiv icon

Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Add code
Jan 14, 2025
Viaarxiv icon

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

Add code
Dec 02, 2024
Figure 1 for VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Figure 2 for VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Figure 3 for VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Figure 4 for VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Viaarxiv icon