Picture for Zhuoming Chen

Zhuoming Chen

MagicPIG: LSH Sampling for Efficient LLM Generation

Add code
Oct 21, 2024
Figure 1 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 2 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 3 for MagicPIG: LSH Sampling for Efficient LLM Generation
Figure 4 for MagicPIG: LSH Sampling for Efficient LLM Generation
Viaarxiv icon

Sirius: Contextual Sparsity with Correction for Efficient LLMs

Add code
Sep 05, 2024
Figure 1 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 2 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 3 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Figure 4 for Sirius: Contextual Sparsity with Correction for Efficient LLMs
Viaarxiv icon

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Add code
Aug 21, 2024
Figure 1 for MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Figure 2 for MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Figure 3 for MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Figure 4 for MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Viaarxiv icon

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Add code
Jul 22, 2024
Figure 1 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 2 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 3 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Figure 4 for MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Viaarxiv icon

SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices

Add code
Jun 04, 2024
Viaarxiv icon

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Add code
Apr 18, 2024
Viaarxiv icon

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

Add code
Feb 29, 2024
Viaarxiv icon

GNNPipe: Accelerating Distributed Full-Graph GNN Training with Pipelined Model Parallelism

Add code
Aug 19, 2023
Viaarxiv icon

SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification

Add code
May 16, 2023
Figure 1 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 2 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 3 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Figure 4 for SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
Viaarxiv icon

Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks

Add code
Oct 02, 2022
Figure 1 for Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks
Figure 2 for Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks
Figure 3 for Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks
Figure 4 for Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks
Viaarxiv icon