Picture for Alexandre Muzio

Alexandre Muzio

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

Add code
Apr 07, 2024
Viaarxiv icon

Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers

Add code
May 28, 2022
Figure 1 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 2 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 3 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Figure 4 for Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers
Viaarxiv icon

Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task

Add code
Nov 03, 2021
Figure 1 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 2 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 3 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Figure 4 for Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task
Viaarxiv icon

Scalable and Efficient MoE Training for Multitask Multilingual Models

Add code
Sep 22, 2021
Figure 1 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 2 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 3 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Figure 4 for Scalable and Efficient MoE Training for Multitask Multilingual Models
Viaarxiv icon

Improving Multilingual Translation by Representation and Gradient Regularization

Add code
Sep 10, 2021
Figure 1 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 2 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 3 for Improving Multilingual Translation by Representation and Gradient Regularization
Figure 4 for Improving Multilingual Translation by Representation and Gradient Regularization
Viaarxiv icon

Discovering Representation Sprachbund For Multilingual Pre-Training

Add code
Sep 01, 2021
Figure 1 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 2 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 3 for Discovering Representation Sprachbund For Multilingual Pre-Training
Figure 4 for Discovering Representation Sprachbund For Multilingual Pre-Training
Viaarxiv icon

DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders

Add code
Jun 25, 2021
Figure 1 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 2 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 3 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Figure 4 for DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders
Viaarxiv icon

XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders

Add code
Dec 31, 2020
Figure 1 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 2 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 3 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Figure 4 for XLM-T: Scaling up Multilingual Machine Translation with Pretrained Cross-lingual Transformer Encoders
Viaarxiv icon