Picture for Geonhwa Jeong

Geonhwa Jeong

SDQ: Sparse Decomposed Quantization for LLM Inference

Add code
Jun 19, 2024
Viaarxiv icon

Demystifying Platform Requirements for Diverse LLM Inference Use Cases

Add code
Jun 03, 2024
Viaarxiv icon

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

Add code
Mar 12, 2024
Figure 1 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 2 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 3 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Figure 4 for Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition
Viaarxiv icon

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

Add code
Mar 11, 2024
Viaarxiv icon

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

Add code
Mar 08, 2024
Viaarxiv icon

VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs

Add code
Feb 23, 2023
Figure 1 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 2 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 3 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Figure 4 for VEGETA: Vertically-Integrated Extensions for Sparse/Dense GEMM Tile Acceleration on CPUs
Viaarxiv icon

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

Add code
Oct 05, 2021
Figure 1 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 2 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 3 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Figure 4 for RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU
Viaarxiv icon

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Add code
Sep 17, 2021
Figure 1 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 2 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 3 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Figure 4 for Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators
Viaarxiv icon

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

Add code
Jun 19, 2021
Figure 1 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 2 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 3 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Figure 4 for Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Viaarxiv icon

ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning

Add code
Sep 04, 2020
Figure 1 for ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Figure 2 for ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Figure 3 for ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Figure 4 for ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning
Viaarxiv icon