Picture for George A. Constantinides

George A. Constantinides

Imperial College London

PolyLUT: Ultra-low Latency Polynomial Inference with Hardware-Aware Structured Pruning

Add code
Jan 14, 2025
Viaarxiv icon

ReducedLUT: Table Decomposition with "Don't Care" Conditions

Add code
Dec 24, 2024
Viaarxiv icon

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Add code
Nov 18, 2024
Figure 1 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 2 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 3 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Figure 4 for BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Viaarxiv icon

QERA: an Analytical Framework for Quantization Error Reconstruction

Add code
Oct 08, 2024
Figure 1 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 2 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 3 for QERA: an Analytical Framework for Quantization Error Reconstruction
Figure 4 for QERA: an Analytical Framework for Quantization Error Reconstruction
Viaarxiv icon

Exploring FPGA designs for MX and beyond

Add code
Jul 01, 2024
Viaarxiv icon

Optimised Grouped-Query Attention Mechanism for Transformers

Add code
Jun 21, 2024
Figure 1 for Optimised Grouped-Query Attention Mechanism for Transformers
Figure 2 for Optimised Grouped-Query Attention Mechanism for Transformers
Figure 3 for Optimised Grouped-Query Attention Mechanism for Transformers
Figure 4 for Optimised Grouped-Query Attention Mechanism for Transformers
Viaarxiv icon

Unlocking the Global Synergies in Low-Rank Adapters

Add code
Jun 21, 2024
Viaarxiv icon

NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions

Add code
Feb 29, 2024
Viaarxiv icon

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Add code
Feb 04, 2024
Viaarxiv icon

Revisiting Block-based Quantisation: What is Important for Sub-8-bit LLM Inference?

Add code
Oct 21, 2023
Viaarxiv icon