Picture for Shiwei Liu

Shiwei Liu

Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning

Add code
Oct 02, 2025
Viaarxiv icon

Diffusion Language Models Know the Answer Before Decoding

Add code
Aug 27, 2025
Figure 1 for Diffusion Language Models Know the Answer Before Decoding
Figure 2 for Diffusion Language Models Know the Answer Before Decoding
Figure 3 for Diffusion Language Models Know the Answer Before Decoding
Figure 4 for Diffusion Language Models Know the Answer Before Decoding
Viaarxiv icon

AlphaDecay:Module-wise Weight Decay for Heavy-Tailed Balancing in LLMs

Add code
Jun 17, 2025
Viaarxiv icon

A Technical Study into Small Reasoning Language Models

Add code
Jun 16, 2025
Viaarxiv icon

Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution

Add code
May 29, 2025
Figure 1 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 2 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 3 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Figure 4 for Leave it to the Specialist: Repair Sparse LLMs with Sparse Fine-Tuning via Sparsity Evolution
Viaarxiv icon

NeuroTrails: Training with Dynamic Sparse Heads as the Key to Effective Ensembling

Add code
May 23, 2025
Viaarxiv icon

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Add code
Feb 27, 2025
Figure 1 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 2 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 3 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Figure 4 for SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers
Viaarxiv icon

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

Add code
Feb 24, 2025
Figure 1 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 2 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 3 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Figure 4 for Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Viaarxiv icon

Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More

Add code
Feb 11, 2025
Figure 1 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 2 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 3 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Figure 4 for Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
Viaarxiv icon

The Curse of Depth in Large Language Models

Add code
Feb 09, 2025
Viaarxiv icon