Picture for Behrooz Ghorbani

Behrooz Ghorbani

Tony

GPT-4o System Card

Add code
Oct 25, 2024
Viaarxiv icon

Order Matters in the Presence of Dataset Imbalance for Multilingual Learning

Add code
Dec 11, 2023
Viaarxiv icon

Epsilon Sampling Rocks: Investigating Sampling Strategies for Minimum Bayes Risk Decoding for Machine Translation

Add code
May 18, 2023
Viaarxiv icon

Scaling Laws for Multilingual Neural Machine Translation

Add code
Feb 19, 2023
Viaarxiv icon

Binarized Neural Machine Translation

Add code
Feb 09, 2023
Viaarxiv icon

Do Current Multi-Task Optimization Methods in Deep Learning Even Help?

Add code
Sep 23, 2022
Figure 1 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 2 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 3 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Figure 4 for Do Current Multi-Task Optimization Methods in Deep Learning Even Help?
Viaarxiv icon

Adaptive Gradient Methods at the Edge of Stability

Add code
Jul 29, 2022
Figure 1 for Adaptive Gradient Methods at the Edge of Stability
Figure 2 for Adaptive Gradient Methods at the Edge of Stability
Figure 3 for Adaptive Gradient Methods at the Edge of Stability
Figure 4 for Adaptive Gradient Methods at the Edge of Stability
Viaarxiv icon

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

Add code
Feb 16, 2022
Figure 1 for Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Figure 2 for Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Figure 3 for Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Figure 4 for Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Viaarxiv icon

Data Scaling Laws in NMT: The Effect of Noise and Architecture

Add code
Feb 04, 2022
Figure 1 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 2 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 3 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Figure 4 for Data Scaling Laws in NMT: The Effect of Noise and Architecture
Viaarxiv icon

A Loss Curvature Perspective on Training Instability in Deep Learning

Add code
Oct 08, 2021
Figure 1 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 2 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 3 for A Loss Curvature Perspective on Training Instability in Deep Learning
Figure 4 for A Loss Curvature Perspective on Training Instability in Deep Learning
Viaarxiv icon