Picture for Douglas Orr

Douglas Orr

Approximate Top-$k$ for Increased Parallelism

Add code
Dec 05, 2024
Figure 1 for Approximate Top-$k$ for Increased Parallelism
Figure 2 for Approximate Top-$k$ for Increased Parallelism
Figure 3 for Approximate Top-$k$ for Increased Parallelism
Figure 4 for Approximate Top-$k$ for Increased Parallelism
Viaarxiv icon

u-$μ$P: The Unit-Scaled Maximal Update Parametrization

Add code
Jul 24, 2024
Viaarxiv icon

SparQ Attention: Bandwidth-Efficient LLM Inference

Add code
Dec 08, 2023
Viaarxiv icon

PopSparse: Accelerated block sparse matrix multiplication on IPU

Add code
Apr 05, 2023
Viaarxiv icon

Unit Scaling: Out-of-the-Box Low-Precision Training

Add code
Mar 20, 2023
Viaarxiv icon

BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion

Add code
Nov 22, 2022
Figure 1 for BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
Figure 2 for BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
Figure 3 for BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
Figure 4 for BESS: Balanced Entity Sampling and Sharing for Large-Scale Knowledge Graph Completion
Viaarxiv icon

Towards Structured Dynamic Sparse Pre-Training of BERT

Add code
Aug 13, 2021
Figure 1 for Towards Structured Dynamic Sparse Pre-Training of BERT
Figure 2 for Towards Structured Dynamic Sparse Pre-Training of BERT
Figure 3 for Towards Structured Dynamic Sparse Pre-Training of BERT
Figure 4 for Towards Structured Dynamic Sparse Pre-Training of BERT
Viaarxiv icon

GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures

Add code
Jun 10, 2021
Figure 1 for GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Figure 2 for GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Figure 3 for GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Figure 4 for GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Viaarxiv icon

N-grams Bayesian Differential Privacy

Add code
Jan 29, 2021
Figure 1 for N-grams Bayesian Differential Privacy
Figure 2 for N-grams Bayesian Differential Privacy
Figure 3 for N-grams Bayesian Differential Privacy
Figure 4 for N-grams Bayesian Differential Privacy
Viaarxiv icon