Picture for Zihan Qiu

Zihan Qiu

Post-hoc Reward Calibration: A Case Study on Length Bias

Add code
Sep 25, 2024
Viaarxiv icon

Layerwise Recurrent Router for Mixture-of-Experts

Add code
Aug 13, 2024
Figure 1 for Layerwise Recurrent Router for Mixture-of-Experts
Figure 2 for Layerwise Recurrent Router for Mixture-of-Experts
Figure 3 for Layerwise Recurrent Router for Mixture-of-Experts
Figure 4 for Layerwise Recurrent Router for Mixture-of-Experts
Viaarxiv icon

Reconstructing Global Daily CO2 Emissions via Machine Learning

Add code
Jul 29, 2024
Viaarxiv icon

A Closer Look into Mixture-of-Experts in Large Language Models

Add code
Jun 26, 2024
Figure 1 for A Closer Look into Mixture-of-Experts in Large Language Models
Figure 2 for A Closer Look into Mixture-of-Experts in Large Language Models
Figure 3 for A Closer Look into Mixture-of-Experts in Large Language Models
Figure 4 for A Closer Look into Mixture-of-Experts in Large Language Models
Viaarxiv icon

Unlocking Continual Learning Abilities in Language Models

Add code
Jun 25, 2024
Figure 1 for Unlocking Continual Learning Abilities in Language Models
Figure 2 for Unlocking Continual Learning Abilities in Language Models
Figure 3 for Unlocking Continual Learning Abilities in Language Models
Figure 4 for Unlocking Continual Learning Abilities in Language Models
Viaarxiv icon

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory

Add code
Jun 18, 2024
Viaarxiv icon

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Add code
May 24, 2024
Viaarxiv icon

A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias

Add code
Apr 01, 2024
Viaarxiv icon

HyperMoE: Paying Attention to Unselected Experts in Mixture of Experts via Dynamic Transfer

Add code
Feb 25, 2024
Viaarxiv icon

Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers

Add code
Feb 19, 2024
Viaarxiv icon