Picture for Shwai He

Shwai He

Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers

Add code
Oct 17, 2024
Viaarxiv icon

What Matters in Transformers? Not All Attention is Needed

Add code
Jun 22, 2024
Viaarxiv icon

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework

Add code
Jun 04, 2024
Figure 1 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 2 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 3 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 4 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Viaarxiv icon

Loki: Low-Rank Keys for Efficient Sparse Attention

Add code
Jun 04, 2024
Figure 1 for Loki: Low-Rank Keys for Efficient Sparse Attention
Figure 2 for Loki: Low-Rank Keys for Efficient Sparse Attention
Figure 3 for Loki: Low-Rank Keys for Efficient Sparse Attention
Figure 4 for Loki: Low-Rank Keys for Efficient Sparse Attention
Viaarxiv icon

RESSA: Repair Sparse Vision-Language Models via Sparse Cross-Modality Adaptation

Add code
Apr 03, 2024
Viaarxiv icon

Reformatted Alignment

Add code
Feb 19, 2024
Viaarxiv icon

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Add code
Feb 15, 2024
Figure 1 for Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Figure 2 for Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Figure 3 for Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Figure 4 for Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Viaarxiv icon

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Add code
Feb 01, 2024
Figure 1 for Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Figure 2 for Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Figure 3 for Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Figure 4 for Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
Viaarxiv icon

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts

Add code
Oct 22, 2023
Viaarxiv icon

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning

Add code
Oct 18, 2023
Viaarxiv icon