Picture for Joel Hestness

Joel Hestness

Crystal: Illuminating LLM Abilities on Language and Code

Add code
Nov 06, 2024
Viaarxiv icon

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Add code
Nov 01, 2024
Viaarxiv icon

Bilingual Adaptation of Monolingual Foundation Models

Add code
Jul 13, 2024
Viaarxiv icon

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Add code
May 24, 2024
Viaarxiv icon

MediSwift: Efficient Sparse Pre-trained Biomedical Language Models

Add code
Mar 01, 2024
Figure 1 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 2 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 3 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Figure 4 for MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Viaarxiv icon

Position Interpolation Improves ALiBi Extrapolation

Add code
Oct 18, 2023
Viaarxiv icon

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

Add code
Sep 20, 2023
Viaarxiv icon

SlimPajama-DC: Understanding Data Combinations for LLM Training

Add code
Sep 19, 2023
Viaarxiv icon

Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

Add code
Apr 06, 2023
Viaarxiv icon

RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

Add code
Jun 28, 2022
Figure 1 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 2 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 3 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Figure 4 for RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network
Viaarxiv icon