Picture for Shiwei Liu

Shiwei Liu

Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?

Add code
Feb 26, 2026
Viaarxiv icon

Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

Add code
Feb 11, 2026
Viaarxiv icon

Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

Diffusion Language Models Know the Answer Before Decoding

Add code
Aug 27, 2025
Figure 1 for Diffusion Language Models Know the Answer Before Decoding
Figure 2 for Diffusion Language Models Know the Answer Before Decoding
Figure 3 for Diffusion Language Models Know the Answer Before Decoding
Figure 4 for Diffusion Language Models Know the Answer Before Decoding
Viaarxiv icon

AlphaDecay:Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

Add code
Jun 17, 2025
Viaarxiv icon

A Technical Study into Small Reasoning Language Models

Add code
Jun 16, 2025
Viaarxiv icon

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Add code
May 29, 2025
Figure 1 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 2 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 3 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 4 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Viaarxiv icon

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Add code
May 23, 2025
Viaarxiv icon

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Add code
Feb 27, 2025
Figure 1 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 2 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 3 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 4 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Viaarxiv icon

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Add code
Feb 24, 2025
Figure 1 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 2 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 3 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 4 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Viaarxiv icon