Picture for Harry Yang

Harry Yang

LoopViT: Scaling Visual ARC with Looped Transformers

Add code
Feb 02, 2026
Viaarxiv icon

RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention

Add code
Dec 30, 2025
Viaarxiv icon

OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation

Add code
Dec 10, 2025
Viaarxiv icon

Distribution Matching Distillation Meets Reinforcement Learning

Add code
Nov 19, 2025
Viaarxiv icon

Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

Add code
Nov 18, 2025
Viaarxiv icon

Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control

Add code
Aug 12, 2025
Figure 1 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 2 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 3 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Figure 4 for Follow-Your-Shape: Shape-Aware Image Editing via Trajectory-Guided Region Control
Viaarxiv icon

Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

Add code
Jun 18, 2025
Viaarxiv icon

When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding

Add code
Jun 05, 2025
Figure 1 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 2 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 3 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Figure 4 for When Semantics Mislead Vision: Mitigating Large Multimodal Models Hallucinations in Scene Text Spotting and Understanding
Viaarxiv icon

Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models

Add code
Apr 04, 2025
Figure 1 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 2 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 3 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Figure 4 for Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Viaarxiv icon

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention

Add code
Mar 20, 2025
Viaarxiv icon