Picture for Joel Hestness

Joel Hestness

Crystal: Illuminating LLM Abilities on Language and Code

Add code
Nov 06, 2024
Figure 1 for Crystal: Illuminating LLM Abilities on Language and Code
Figure 2 for Crystal: Illuminating LLM Abilities on Language and Code
Figure 3 for Crystal: Illuminating LLM Abilities on Language and Code
Figure 4 for Crystal: Illuminating LLM Abilities on Language and Code
Viaarxiv icon

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Add code
Nov 01, 2024
Viaarxiv icon

Bilingual Adaptation of Monolingual Foundation Models

Add code
Jul 13, 2024
Figure 1 for Bilingual Adaptation of Monolingual Foundation Models
Figure 2 for Bilingual Adaptation of Monolingual Foundation Models
Figure 3 for Bilingual Adaptation of Monolingual Foundation Models
Figure 4 for Bilingual Adaptation of Monolingual Foundation Models
Viaarxiv icon

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Add code
May 24, 2024
Viaarxiv icon

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Add code
Mar 01, 2024
Figure 1 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 2 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 3 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 4 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Viaarxiv icon

Position Interpolation Improves ALiBi Extrapolation

Add code
Oct 18, 2023
Viaarxiv icon

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Add code
Sep 20, 2023
Viaarxiv icon

SlimPajama-DC: Understanding Data Combinations for LLM Training

Add code
Sep 19, 2023
Viaarxiv icon

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Add code
Apr 06, 2023
Viaarxiv icon

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

Add code
Jun 28, 2022
Figure 1 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 2 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 3 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 4 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Viaarxiv icon