Picture for Xun Zhou

Xun Zhou

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Add code
Mar 20, 2025
Viaarxiv icon

Frac-Connections: Fractional Extension of Hyper-Connections

Add code
Mar 18, 2025
Viaarxiv icon

Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning

Add code
Mar 07, 2025
Viaarxiv icon

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Add code
Mar 06, 2025
Viaarxiv icon

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Add code
Feb 28, 2025
Viaarxiv icon

Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models

Add code
Feb 21, 2025
Viaarxiv icon

Enhancing Auto-regressive Chain-of-Thought through Loop-Aligned Reasoning

Add code
Feb 12, 2025
Viaarxiv icon

Large Memory Network for Recommendation

Add code
Feb 08, 2025
Figure 1 for Large Memory Network for Recommendation
Figure 2 for Large Memory Network for Recommendation
Figure 3 for Large Memory Network for Recommendation
Figure 4 for Large Memory Network for Recommendation
Viaarxiv icon

Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Add code
Feb 08, 2025
Figure 1 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders
Figure 2 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders
Figure 3 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders
Figure 4 for Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders
Viaarxiv icon

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Add code
Jan 28, 2025
Viaarxiv icon