Picture for Zhongfeng Wang

Zhongfeng Wang

Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format

Add code
Nov 24, 2024
Viaarxiv icon

TaQ-DiT: Time-aware Quantization for Diffusion Transformers

Add code
Nov 21, 2024
Viaarxiv icon

M$^2$-ViT: Accelerating Hybrid Vision Transformers with Two-Level Mixed Quantization

Add code
Oct 10, 2024
Viaarxiv icon

Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores

Add code
Sep 26, 2024
Viaarxiv icon

NASH: Neural Architecture and Accelerator Search for Multiplication-Reduced Hybrid Models

Add code
Sep 07, 2024
Viaarxiv icon

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Add code
Jul 16, 2024
Viaarxiv icon

P$^2$-ViT: Power-of-Two Post-Training Quantization and Acceleration for Fully Quantized Vision Transformer

Add code
May 30, 2024
Viaarxiv icon

Trio-ViT: Post-Training Quantization and Acceleration for Softmax-Free Efficient Vision Transformer

Add code
May 06, 2024
Viaarxiv icon

An FPGA-Based Reconfigurable Accelerator for Convolution-Transformer Hybrid EfficientViT

Add code
Mar 29, 2024
Viaarxiv icon

An FPGA-Based Accelerator Enabling Efficient Support for CNNs with Arbitrary Kernel Sizes

Add code
Feb 22, 2024
Viaarxiv icon