Picture for Beren Millidge

Beren Millidge

University of Oxford

Online Vector Quantized Attention

Add code
Feb 03, 2026
Viaarxiv icon

Equivalence of Personalized PageRank and Successor Representations

Add code
Dec 31, 2025
Viaarxiv icon

Generalising E-prop to Deep Networks

Add code
Dec 30, 2025
Viaarxiv icon

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Add code
Oct 06, 2025
Viaarxiv icon

Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG

Add code
Dec 08, 2024
Figure 1 for Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
Figure 2 for Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
Figure 3 for Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
Figure 4 for Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG
Viaarxiv icon

The Zamba2 Suite: Technical Report

Add code
Nov 22, 2024
Viaarxiv icon

Zyda-2: a 5 Trillion Token High-Quality Dataset

Add code
Nov 09, 2024
Figure 1 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 2 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 3 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Figure 4 for Zyda-2: a 5 Trillion Token High-Quality Dataset
Viaarxiv icon

Exploring Action-Centric Representations Through the Lens of Rate-Distortion Theory

Add code
Sep 13, 2024
Viaarxiv icon

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Add code
Aug 09, 2024
Figure 1 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 2 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 3 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Figure 4 for Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
Viaarxiv icon

Zyda: A 1.3T Dataset for Open Language Modeling

Add code
Jun 04, 2024
Viaarxiv icon