Picture for Ruihang Lai

Ruihang Lai

Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

Add code
Jan 27, 2026
Viaarxiv icon

Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths

Add code
Jan 10, 2026
Viaarxiv icon

Mirage Persistent Kernel: A Compiler and Runtime for Mega-Kernelizing Tensor Programs

Add code
Dec 22, 2025
Viaarxiv icon

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

Add code
Jan 02, 2025
Figure 1 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 2 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 3 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Figure 4 for FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving
Viaarxiv icon

WebLLM: A High-Performance In-Browser LLM Inference Engine

Add code
Dec 20, 2024
Figure 1 for WebLLM: A High-Performance In-Browser LLM Inference Engine
Figure 2 for WebLLM: A High-Performance In-Browser LLM Inference Engine
Viaarxiv icon

XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models

Add code
Nov 22, 2024
Figure 1 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 2 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 3 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Figure 4 for XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models
Viaarxiv icon

Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development

Add code
Apr 14, 2024
Viaarxiv icon

Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Add code
Nov 01, 2023
Figure 1 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 2 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 3 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Figure 4 for Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Viaarxiv icon

SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning

Add code
Jul 11, 2022
Figure 1 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 2 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 3 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Figure 4 for SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning
Viaarxiv icon

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Add code
Jul 09, 2022
Figure 1 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 2 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 3 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Figure 4 for TensorIR: An Abstraction for Automatic Tensorized Program Optimization
Viaarxiv icon