Picture for Ziyang Chen

Ziyang Chen

Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding

Add code
Mar 17, 2025
Viaarxiv icon

A Survey of fMRI to Image Reconstruction

Add code
Feb 24, 2025
Viaarxiv icon

From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs

Add code
Feb 13, 2025
Viaarxiv icon

Test Time Training for 4D Medical Image Interpolation

Add code
Feb 04, 2025
Figure 1 for Test Time Training for 4D Medical Image Interpolation
Figure 2 for Test Time Training for 4D Medical Image Interpolation
Figure 3 for Test Time Training for 4D Medical Image Interpolation
Figure 4 for Test Time Training for 4D Medical Image Interpolation
Viaarxiv icon

GPS as a Control Signal for Image Generation

Add code
Jan 21, 2025
Viaarxiv icon

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

Add code
Jan 19, 2025
Viaarxiv icon

Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer

Add code
Jan 02, 2025
Viaarxiv icon

Meta Curvature-Aware Minimization for Domain Generalization

Add code
Dec 16, 2024
Viaarxiv icon

Video-Guided Foley Sound Generation with Multimodal Controls

Add code
Nov 26, 2024
Viaarxiv icon

Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph

Add code
Nov 19, 2024
Figure 1 for Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph
Figure 2 for Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph
Figure 3 for Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph
Figure 4 for Motif Channel Opened in a White-Box: Stereo Matching via Motif Correlation Graph
Viaarxiv icon