Picture for Zhuocheng Gong

Zhuocheng Gong

FIRP: Faster LLM inference via future intermediate representation prediction

Add code
Oct 27, 2024
Viaarxiv icon

Graph-Structured Speculative Decoding

Add code
Jul 23, 2024
Viaarxiv icon

Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

Add code
Jul 09, 2024
Viaarxiv icon

Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

Add code
Apr 18, 2024
Viaarxiv icon

What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

Add code
Mar 11, 2024
Viaarxiv icon

Improving Input-label Mapping with Demonstration Replay for In-context Learning

Add code
Oct 30, 2023
Viaarxiv icon

PreQuant: A Task-agnostic Quantization Approach for Pre-trained Language Models

Add code
May 30, 2023
Viaarxiv icon