Picture for Róbert Csordás

Róbert Csordás

GIM: Improved Interpretability for Large Language Models

Add code
May 23, 2025
Viaarxiv icon

Do Language Models Use Their Depth Efficiently?

Add code
May 20, 2025
Figure 1 for Do Language Models Use Their Depth Efficiently?
Figure 2 for Do Language Models Use Their Depth Efficiently?
Figure 3 for Do Language Models Use Their Depth Efficiently?
Figure 4 for Do Language Models Use Their Depth Efficiently?
Viaarxiv icon

Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing

Add code
May 01, 2025
Viaarxiv icon

Measuring In-Context Computation Complexity via Hidden State Prediction

Add code
Mar 17, 2025
Figure 1 for Measuring In-Context Computation Complexity via Hidden State Prediction
Figure 2 for Measuring In-Context Computation Complexity via Hidden State Prediction
Figure 3 for Measuring In-Context Computation Complexity via Hidden State Prediction
Figure 4 for Measuring In-Context Computation Complexity via Hidden State Prediction
Viaarxiv icon

MrT5: Dynamic Token Merging for Efficient Byte-level Language Models

Add code
Oct 28, 2024
Viaarxiv icon

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Add code
Aug 20, 2024
Figure 1 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 2 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 3 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 4 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Viaarxiv icon

MoEUT: Mixture-of-Experts Universal Transformers

Add code
May 25, 2024
Viaarxiv icon

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

Add code
Dec 14, 2023
Figure 1 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 2 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 3 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Figure 4 for SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Viaarxiv icon

Automating Continual Learning

Add code
Dec 01, 2023
Viaarxiv icon

Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

Add code
Oct 24, 2023
Viaarxiv icon