Picture for Razvan Pascanu

Razvan Pascanu

Google DeepMind

Why do LLMs attend to the first token?

Add code
Apr 03, 2025
Viaarxiv icon

NoProp: Training Neural Networks without Back-propagation or Forward-propagation

Add code
Mar 31, 2025
Viaarxiv icon

How do language models learn facts? Dynamics, curricula and hallucinations

Add code
Mar 27, 2025
Viaarxiv icon

Agency Is Frame-Dependent

Add code
Feb 06, 2025
Figure 1 for Agency Is Frame-Dependent
Viaarxiv icon

Torque-Aware Momentum

Add code
Dec 25, 2024
Viaarxiv icon

TRecViT: A Recurrent Video Transformer

Add code
Dec 18, 2024
Viaarxiv icon

Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset

Add code
Nov 06, 2024
Figure 1 for Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Figure 2 for Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Figure 3 for Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Figure 4 for Non-Stationary Learning of Neural Networks with Automatic Soft Parameter Reset
Viaarxiv icon

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Add code
Oct 29, 2024
Viaarxiv icon

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Add code
Oct 09, 2024
Figure 1 for Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Figure 2 for Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Figure 3 for Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Figure 4 for Retrieval-Augmented Decision Transformer: External Memory for In-context RL
Viaarxiv icon

Round and Round We Go! What makes Rotary Positional Encodings useful?

Add code
Oct 08, 2024
Viaarxiv icon