Picture for Wenlei Bao

Wenlei Bao

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

Add code
Oct 28, 2024
Viaarxiv icon

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

Add code
Jun 12, 2024
Figure 1 for FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Figure 2 for FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Figure 3 for FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Figure 4 for FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Viaarxiv icon

NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques

Add code
Nov 13, 2019
Figure 1 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 2 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 3 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Figure 4 for NGEMM: Optimizing GEMM for Deep Learning via Compiler-based Techniques
Viaarxiv icon