Picture for Bei Li

Bei Li

Optimizing Speech Multi-View Feature Fusion through Conditional Computation

Add code
Jan 14, 2025
Viaarxiv icon

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Add code
Jan 07, 2025
Viaarxiv icon

Disentangling Preference Representation and Text Generation for Efficient Individual Preference Alignment

Add code
Dec 30, 2024
Viaarxiv icon

Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization

Add code
Dec 02, 2024
Figure 1 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 2 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 3 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Figure 4 for Early Exit Is a Natural Capability in Transformer-based Models: An Empirical Study on Early Exit without Joint Optimization
Viaarxiv icon

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning

Add code
Nov 05, 2024
Viaarxiv icon

Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models

Add code
Oct 08, 2024
Viaarxiv icon

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Add code
Oct 07, 2024
Figure 1 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 2 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 3 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Figure 4 for Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models
Viaarxiv icon

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Add code
Sep 01, 2024
Figure 1 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 2 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 3 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Figure 4 for ReMamba: Equip Mamba with Effective Long-Sequence Modeling
Viaarxiv icon

NDP: Next Distribution Prediction as a More Broad Target

Add code
Aug 30, 2024
Figure 1 for NDP: Next Distribution Prediction as a More Broad Target
Figure 2 for NDP: Next Distribution Prediction as a More Broad Target
Figure 3 for NDP: Next Distribution Prediction as a More Broad Target
Figure 4 for NDP: Next Distribution Prediction as a More Broad Target
Viaarxiv icon

Translate-and-Revise: Boosting Large Language Models for Constrained Translation

Add code
Jul 18, 2024
Viaarxiv icon