Picture for Dongsoo Lee

Dongsoo Lee

SUN: Shared Use of Next-token Prediction for Efficient Multi-LLM Disaggregated Serving

Add code
Mar 03, 2026
Viaarxiv icon

Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

Add code
Feb 26, 2026
Viaarxiv icon

PrefillShare: A Shared Prefill Module for KV Reuse in Multi-LLM Disaggregated Serving

Add code
Feb 12, 2026
Viaarxiv icon

CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs

Add code
Dec 19, 2025
Viaarxiv icon

Unifying Uniform and Binary-coding Quantization for Accurate Compression of Large Language Models

Add code
Jun 04, 2025
Viaarxiv icon

An Investigation of FP8 Across Accelerators for LLM Inference

Add code
Feb 03, 2025
Figure 1 for An Investigation of FP8 Across Accelerators for LLM Inference
Figure 2 for An Investigation of FP8 Across Accelerators for LLM Inference
Figure 3 for An Investigation of FP8 Across Accelerators for LLM Inference
Figure 4 for An Investigation of FP8 Across Accelerators for LLM Inference
Viaarxiv icon

Debunking the CUDA Myth Towards GPU-based AI Systems

Add code
Dec 31, 2024
Figure 1 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 2 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 3 for Debunking the CUDA Myth Towards GPU-based AI Systems
Figure 4 for Debunking the CUDA Myth Towards GPU-based AI Systems
Viaarxiv icon

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

Add code
Jul 16, 2024
Figure 1 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 2 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 3 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Figure 4 for LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices
Viaarxiv icon

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability

Add code
May 29, 2024
Figure 1 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 2 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 3 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Figure 4 for To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability
Viaarxiv icon

HyperCLOVA X Technical Report

Add code
Apr 13, 2024
Viaarxiv icon