Picture for Daya Khudia

Daya Khudia

Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs

Add code
Oct 23, 2024
Viaarxiv icon

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining

Add code
Jan 16, 2024
Viaarxiv icon

Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale

Add code
May 26, 2021
Figure 1 for Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Figure 2 for Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Figure 3 for Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Figure 4 for Low-Precision Hardware Architectures Meet Recommendation Model Inference at Scale
Viaarxiv icon

FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference

Add code
Jan 13, 2021
Figure 1 for FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
Figure 2 for FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
Figure 3 for FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
Figure 4 for FBGEMM: Enabling High-Performance Low-Precision Deep Learning Inference
Viaarxiv icon

Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications

Add code
Nov 29, 2018
Figure 1 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Figure 2 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Figure 3 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Figure 4 for Deep Learning Inference in Facebook Data Centers: Characterization, Performance Optimizations and Hardware Implications
Viaarxiv icon