Picture for George A. Constantinides

George A. Constantinides

Imperial College London

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Add code
Nov 18, 2024
Figure 1 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 2 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 3 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 4 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Viaarxiv icon

QERA: an Analytical Framework for Quantization Error Reconstruction

Add code
Oct 08, 2024
Figure 1 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 2 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 3 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 4 for QERA: an Analytical Framework for Quantization Error Reconstruction
Viaarxiv icon

Exploring FPGA designs for MX and beyond

Add code
Jul 01, 2024
Viaarxiv icon

Optimised Grouped-Query Attention Mechanism for Transformers

Add code
Jun 21, 2024
Viaarxiv icon

Unlocking the Global Synergies in Low-Rank Adapters

Add code
Jun 21, 2024
Viaarxiv icon

NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions

Add code
Feb 29, 2024
Viaarxiv icon

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Add code
Feb 04, 2024
Viaarxiv icon

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Add code
Oct 21, 2023
Viaarxiv icon

PolyLUT: Learning Piecewise Polynomials for Ultra-Low Latency FPGA LUT-based Inference

Add code
Sep 05, 2023
Viaarxiv icon

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

Add code
Aug 09, 2023
Viaarxiv icon