Picture for Kun-Yu Lin

Kun-Yu Lin

From Watch to Imagine: Steering Long-horizon Manipulation via Human Demonstration and Future Envisionment

Add code
Sep 26, 2025
Viaarxiv icon

CoopDiff: Anticipating 3D Human-object Interactions via Contact-consistent Decoupled Diffusion

Add code
Aug 10, 2025
Viaarxiv icon

Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text

Add code
May 22, 2025
Viaarxiv icon

Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization

Add code
May 21, 2025
Viaarxiv icon

ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding

Add code
Apr 25, 2025
Figure 1 for ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Figure 2 for ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Figure 3 for ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Figure 4 for ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Viaarxiv icon

Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks

Add code
Apr 02, 2025
Figure 1 for Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Figure 2 for Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Figure 3 for Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Figure 4 for Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Viaarxiv icon

Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks

Add code
Mar 31, 2025
Viaarxiv icon

ViSpeak: Visual Instruction Feedback in Streaming Videos

Add code
Mar 17, 2025
Figure 1 for ViSpeak: Visual Instruction Feedback in Streaming Videos
Figure 2 for ViSpeak: Visual Instruction Feedback in Streaming Videos
Figure 3 for ViSpeak: Visual Instruction Feedback in Streaming Videos
Figure 4 for ViSpeak: Visual Instruction Feedback in Streaming Videos
Viaarxiv icon

Task-Oriented 6-DoF Grasp Pose Detection in Clutters

Add code
Feb 24, 2025
Viaarxiv icon

ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations

Add code
Jan 24, 2025
Figure 1 for ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Figure 2 for ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Figure 3 for ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Figure 4 for ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Viaarxiv icon