Picture for Zihao Ye

Zihao Ye

University of Washington

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Add code
Jan 02, 2025
Figure 1 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 2 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 3 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 4 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Viaarxiv icon

MagicPIG: LSH Sampling for Efficient LLM Generation

Add code
Oct 21, 2024
Figure 1 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 2 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 3 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 4 for MagicPIG: LSH Sampling for Efficient LLM Generation
Viaarxiv icon

Improving Image De-raining Using Reference-Guided Transformers

Add code
Aug 01, 2024
Figure 1 for Improving Image De-raining Using Reference-Guided Transformers
Figure 2 for Improving Image De-raining Using Reference-Guided Transformers
Figure 3 for Improving Image De-raining Using Reference-Guided Transformers
Figure 4 for Improving Image De-raining Using Reference-Guided Transformers
Viaarxiv icon

Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Add code
Nov 07, 2023
Figure 1 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 2 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 3 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Figure 4 for Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Viaarxiv icon

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Add code
Nov 01, 2023
Figure 1 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 2 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 3 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 4 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Viaarxiv icon

Punica: Multi-Tenant LoRA Serving

Add code
Oct 28, 2023
Viaarxiv icon

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Add code
Jul 11, 2022
Figure 1 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 2 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 3 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 4 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Viaarxiv icon

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Add code
Jul 09, 2022
Figure 1 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 2 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 3 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 4 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Viaarxiv icon

FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems

Add code
Sep 29, 2020
Figure 1 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 2 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 3 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Figure 4 for FeatGraph: A Flexible and Efficient Backend for Graph Neural Network Systems
Viaarxiv icon

Transformer on a Diet

Add code
Feb 14, 2020
Figure 1 for Transformer on a Diet
Figure 2 for Transformer on a Diet
Figure 3 for Transformer on a Diet
Viaarxiv icon