Picture for Yimeng Wu

Yimeng Wu

Scaling Law for Language Models Training Considering Batch Size

Add code
Dec 02, 2024
Viaarxiv icon

ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models

Add code
Nov 14, 2024
Figure 1 for ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
Figure 2 for ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
Figure 3 for ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
Figure 4 for ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models
Viaarxiv icon

AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language Processing

Add code
Jun 11, 2023
Viaarxiv icon

Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding

Add code
May 21, 2022
Figure 1 for Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Figure 2 for Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Figure 3 for Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Figure 4 for Revisiting Pre-trained Language Models and their Evaluation for Arabic Natural Language Understanding
Viaarxiv icon

JABER and SABER: Junior and Senior Arabic BERt

Add code
Jan 09, 2022
Figure 1 for JABER and SABER: Junior and Senior Arabic BERt
Figure 2 for JABER and SABER: Junior and Senior Arabic BERt
Figure 3 for JABER and SABER: Junior and Senior Arabic BERt
Figure 4 for JABER and SABER: Junior and Senior Arabic BERt
Viaarxiv icon

ALP-KD: Attention-Based Layer Projection for Knowledge Distillation

Add code
Dec 27, 2020
Figure 1 for ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Figure 2 for ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Figure 3 for ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Figure 4 for ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Viaarxiv icon

Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

Add code
Oct 06, 2020
Figure 1 for Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Figure 2 for Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Figure 3 for Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Figure 4 for Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Viaarxiv icon