Picture for Bei Li

Bei Li

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

Add code
Mar 09, 2025
Viaarxiv icon

Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective

Add code
Feb 20, 2025
Viaarxiv icon

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Add code
Jan 14, 2025
Viaarxiv icon

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Add code
Jan 07, 2025
Viaarxiv icon

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

Add code
Dec 30, 2024
Viaarxiv icon

Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

Add code
Dec 02, 2024
Figure 1 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 2 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 3 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 4 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Viaarxiv icon

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Add code
Oct 07, 2024
Figure 1 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 2 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 3 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 4 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Viaarxiv icon

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Add code
Sep 01, 2024
Figure 1 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 2 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 3 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 4 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Viaarxiv icon