Picture for Bei Li

Bei Li

Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

Add code
Dec 02, 2024
Viaarxiv icon

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Add code
Oct 07, 2024
Figure 1 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 2 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 3 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 4 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Viaarxiv icon

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Add code
Sep 01, 2024
Viaarxiv icon

NDP: Next Distribution Prediction as a More Broad Target

Add code
Aug 30, 2024
Figure 1 for NDP: Next Distribution Prediction as a More Broad Target
Figure 2 for NDP: Next Distribution Prediction as a More Broad Target
Figure 3 for NDP: Next Distribution Prediction as a More Broad Target
Figure 4 for NDP: Next Distribution Prediction as a More Broad Target
Viaarxiv icon

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

Add code
Jul 18, 2024
Viaarxiv icon

Hybrid Alignment Training for Large Language Models

Add code
Jun 21, 2024
Viaarxiv icon

3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Add code
Apr 29, 2024
Viaarxiv icon

Large Language Models are Parallel Multilingual Learners

Add code
Mar 14, 2024
Viaarxiv icon