Picture for Xiangyu Zhang

Xiangyu Zhang

Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

Add code
Mar 14, 2025
Viaarxiv icon

Why Pre-trained Models Fail: Feature Entanglement in Multi-modal Depression Detection

Add code
Mar 09, 2025
Viaarxiv icon

Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining

Add code
Mar 06, 2025
Viaarxiv icon

Foot-In-The-Door: A Multi-turn Jailbreak for LLMs

Add code
Feb 28, 2025
Viaarxiv icon

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

Add code
Feb 18, 2025
Viaarxiv icon

Unhackable Temporal Rewarding for Scalable Video MLLMs

Add code
Feb 17, 2025
Viaarxiv icon

PerPO: Perceptual Preference Optimization via Discriminative Rewarding

Add code
Feb 05, 2025
Viaarxiv icon

Predicting 3D representations for Dynamic Scenes

Add code
Jan 28, 2025
Figure 1 for Predicting 3D representations for Dynamic Scenes
Figure 2 for Predicting 3D representations for Dynamic Scenes
Figure 3 for Predicting 3D representations for Dynamic Scenes
Figure 4 for Predicting 3D representations for Dynamic Scenes
Viaarxiv icon

CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

Add code
Jan 27, 2025
Viaarxiv icon

Taming Teacher Forcing for Masked Autoregressive Video Generation

Add code
Jan 21, 2025
Viaarxiv icon