Picture for Dwarak Talupuru

Dwarak Talupuru

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Add code
Jan 30, 2025
Figure 1 for Rope to Nope and Back Again: A New Hybrid Attention Strategy
Figure 2 for Rope to Nope and Back Again: A New Hybrid Attention Strategy
Figure 3 for Rope to Nope and Back Again: A New Hybrid Attention Strategy
Figure 4 for Rope to Nope and Back Again: A New Hybrid Attention Strategy
Viaarxiv icon

Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier

Add code
Dec 05, 2024
Viaarxiv icon

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Add code
Nov 19, 2024
Figure 1 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 2 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 3 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Figure 4 for Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models
Viaarxiv icon

Aya 23: Open Weight Releases to Further Multilingual Progress

Add code
May 23, 2024
Figure 1 for Aya 23: Open Weight Releases to Further Multilingual Progress
Figure 2 for Aya 23: Open Weight Releases to Further Multilingual Progress
Figure 3 for Aya 23: Open Weight Releases to Further Multilingual Progress
Figure 4 for Aya 23: Open Weight Releases to Further Multilingual Progress
Viaarxiv icon