Picture for Jingwen Leng

Jingwen Leng

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Add code
Nov 24, 2024
Figure 1 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 2 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 3 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Figure 4 for Nimbus: Secure and Efficient Two-Party Inference for Transformers
Viaarxiv icon

vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

Add code
Jul 22, 2024
Figure 1 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 2 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 3 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Figure 4 for vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving
Viaarxiv icon

Fovea Transformer: Efficient Long-Context Modeling with Structured Fine-to-Coarse Attention

Add code
Nov 13, 2023
Viaarxiv icon

Accelerating Generic Graph Neural Networks via Architecture, Compiler, Partition Method Co-Design

Add code
Aug 16, 2023
Viaarxiv icon

AdaptGear: Accelerating GNN Training via Adaptive Subgraph-Level Kernels on GPUs

Add code
May 27, 2023
Viaarxiv icon

Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator

Add code
May 24, 2023
Viaarxiv icon

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

Add code
Sep 22, 2022
Figure 1 for Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Figure 2 for Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Figure 3 for Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Figure 4 for Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training
Viaarxiv icon

ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization

Add code
Aug 30, 2022
Figure 1 for ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Figure 2 for ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Figure 3 for ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Figure 4 for ANT: Exploiting Adaptive Numerical Data Type for Low-bit Deep Neural Network Quantization
Viaarxiv icon

Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization

Add code
Aug 25, 2022
Figure 1 for Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization
Figure 2 for Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization
Figure 3 for Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization
Figure 4 for Efficient Activation Quantization via Adaptive Rounding Border for Post-Training Quantization
Viaarxiv icon

SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences

Add code
Jun 29, 2022
Figure 1 for SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Figure 2 for SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Figure 3 for SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Figure 4 for SALO: An Efficient Spatial Accelerator Enabling Hybrid Sparse Attention Mechanisms for Long Sequences
Viaarxiv icon