Picture for Yizhe Xiong

Yizhe Xiong

Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts

Add code
Feb 18, 2025
Figure 1 for Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
Figure 2 for Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
Figure 3 for Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
Figure 4 for Finedeep: Mitigating Sparse Activation in Dense LLMs via Multi-Layer Fine-Grained Experts
Viaarxiv icon

DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs

Add code
Feb 18, 2025
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon

Breaking the Stage Barrier: A Novel Single-Stage Approach to Long Context Extension for Large Language Models

Add code
Dec 10, 2024
Viaarxiv icon

LBPE: Long-token-first Tokenization to Improve Large Language Models

Add code
Nov 08, 2024
Figure 1 for LBPE: Long-token-first Tokenization to Improve Large Language Models
Figure 2 for LBPE: Long-token-first Tokenization to Improve Large Language Models
Figure 3 for LBPE: Long-token-first Tokenization to Improve Large Language Models
Figure 4 for LBPE: Long-token-first Tokenization to Improve Large Language Models
Viaarxiv icon

CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts

Add code
Oct 21, 2024
Viaarxiv icon

MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

Add code
Jul 13, 2024
Viaarxiv icon

Temporal Scaling Law for Large Language Models

Add code
Apr 27, 2024
Viaarxiv icon

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Add code
Apr 27, 2024
Viaarxiv icon

PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

Add code
Mar 14, 2024
Figure 1 for PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Figure 2 for PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Figure 3 for PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Figure 4 for PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Viaarxiv icon