Picture for Daize Dong

Daize Dong

LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training

Add code
Nov 24, 2024
Viaarxiv icon

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training

Add code
Jun 24, 2024
Viaarxiv icon

Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts

Add code
Jun 17, 2024
Figure 1 for Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Figure 2 for Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Figure 3 for Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Figure 4 for Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Viaarxiv icon

Demystifying the Compression of Mixture-of-Experts Through a Unified Framework

Add code
Jun 04, 2024
Figure 1 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 2 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 3 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Figure 4 for Demystifying the Compression of Mixture-of-Experts Through a Unified Framework
Viaarxiv icon

iDAT: inverse Distillation Adapter-Tuning

Add code
Mar 23, 2024
Viaarxiv icon

A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

Add code
Feb 04, 2024
Viaarxiv icon

Cherry Hypothesis: Identifying the Cherry on the Cake for Dynamic Networks

Add code
Nov 10, 2022
Viaarxiv icon

SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters

Add code
Oct 11, 2022
Figure 1 for SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
Figure 2 for SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
Figure 3 for SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
Figure 4 for SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
Viaarxiv icon