Picture for Mikhail Burtsev

Mikhail Burtsev

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

Add code
Mar 14, 2026
Viaarxiv icon

Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training

Add code
Nov 10, 2025
Viaarxiv icon

Limitations of Normalization in Attention Mechanism

Add code
Aug 25, 2025
Figure 1 for Limitations of Normalization in Attention Mechanism
Figure 2 for Limitations of Normalization in Attention Mechanism
Figure 3 for Limitations of Normalization in Attention Mechanism
Figure 4 for Limitations of Normalization in Attention Mechanism
Viaarxiv icon

Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity

Add code
Feb 18, 2025
Viaarxiv icon

Learning Elementary Cellular Automata with Transformers

Add code
Dec 02, 2024
Figure 1 for Learning Elementary Cellular Automata with Transformers
Figure 2 for Learning Elementary Cellular Automata with Transformers
Figure 3 for Learning Elementary Cellular Automata with Transformers
Viaarxiv icon

Associative Recurrent Memory Transformer

Add code
Jul 05, 2024
Figure 1 for Associative Recurrent Memory Transformer
Figure 2 for Associative Recurrent Memory Transformer
Figure 3 for Associative Recurrent Memory Transformer
Figure 4 for Associative Recurrent Memory Transformer
Viaarxiv icon

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Add code
Jul 05, 2024
Figure 1 for AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Figure 2 for AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Figure 3 for AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Figure 4 for AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents
Viaarxiv icon

Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

Add code
Jun 20, 2024
Figure 1 for Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
Figure 2 for Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
Figure 3 for Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
Figure 4 for Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task
Viaarxiv icon

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

Add code
Jun 14, 2024
Figure 1 for BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Figure 2 for BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Figure 3 for BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Figure 4 for BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Viaarxiv icon

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

Add code
Feb 21, 2024
Figure 1 for In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Figure 2 for In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Figure 3 for In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Figure 4 for In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss
Viaarxiv icon