Picture for Qiyang Min

Qiyang Min

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Add code
Jan 29, 2026
Viaarxiv icon

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Add code
Dec 31, 2025
Viaarxiv icon

Virtual Width Networks

Add code
Nov 17, 2025
Figure 1 for Virtual Width Networks
Figure 2 for Virtual Width Networks
Figure 3 for Virtual Width Networks
Figure 4 for Virtual Width Networks
Viaarxiv icon

Scaling Latent Reasoning via Looped Language Models

Add code
Oct 29, 2025
Figure 1 for Scaling Latent Reasoning via Looped Language Models
Figure 2 for Scaling Latent Reasoning via Looped Language Models
Figure 3 for Scaling Latent Reasoning via Looped Language Models
Figure 4 for Scaling Latent Reasoning via Looped Language Models
Viaarxiv icon

SeeDNorm: Self-Rescaled Dynamic Normalization

Add code
Oct 26, 2025
Viaarxiv icon

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Add code
Aug 26, 2025
Figure 1 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 2 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 3 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Figure 4 for UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
Viaarxiv icon

Expert Race: A Flexible Routing Strategy for Scaling Diffusion Transformer with Mixture of Experts

Add code
Mar 20, 2025
Viaarxiv icon

Frac-Connections: Fractional Extension of Hyper-Connections

Add code
Mar 18, 2025
Viaarxiv icon

Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling

Add code
Jan 28, 2025
Viaarxiv icon

Ultra-Sparse Memory Network

Add code
Nov 19, 2024
Viaarxiv icon