Picture for Bei Li

Bei Li

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Add code
Oct 07, 2024
Viaarxiv icon

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Add code
Sep 01, 2024
Viaarxiv icon

NDP: Next Distribution Prediction as a More Broad Target

Add code
Aug 30, 2024
Viaarxiv icon

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

Add code
Jul 18, 2024
Viaarxiv icon

Hybrid Alignment Training for Large Language Models

Add code
Jun 21, 2024
Viaarxiv icon

3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Add code
Apr 29, 2024
Viaarxiv icon

Large Language Models are Parallel Multilingual Learners

Add code
Mar 14, 2024
Viaarxiv icon

Soft Alignment of Modality Space for End-to-end Speech Translation

Add code
Dec 18, 2023
Viaarxiv icon