Picture for Ru Huang

Ru Huang

MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers

Add code
Oct 23, 2024
Figure 1 for MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Figure 2 for MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Figure 3 for MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Figure 4 for MCUBERT: Memory-Efficient BERT Inference on Commodity Microcontrollers
Viaarxiv icon

AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

Add code
Aug 19, 2024
Viaarxiv icon

The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective

Add code
Aug 19, 2024
Viaarxiv icon

GAIA: Rethinking Action Quality Assessment for AI-Generated Videos

Add code
Jun 10, 2024
Figure 1 for GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Figure 2 for GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Figure 3 for GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Figure 4 for GAIA: Rethinking Action Quality Assessment for AI-Generated Videos
Viaarxiv icon

FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference

Add code
May 25, 2024
Viaarxiv icon

EasyACIM: An End-to-End Automated Analog CIM with Synthesizable Architecture and Agile Design Space Exploration

Add code
Apr 12, 2024
Viaarxiv icon

PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction

Add code
Mar 27, 2024
Viaarxiv icon

ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding

Add code
Feb 21, 2024
Figure 1 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 2 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 3 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Figure 4 for ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel Decoding
Viaarxiv icon

AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology

Add code
Jan 21, 2024
Viaarxiv icon

FS-BAND: A Frequency-Sensitive Banding Detector

Add code
Nov 30, 2023
Viaarxiv icon