Picture for Yusuke Iwasawa

Yusuke Iwasawa

ADOPT: Modified Adam Can Converge with Any $β_2$ with the Optimal Rate

Add code
Nov 05, 2024
Viaarxiv icon

Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?

Add code
Oct 09, 2024
Figure 1 for Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
Figure 2 for Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
Figure 3 for Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
Figure 4 for Which Programming Language and What Features at Pre-training Stage Affect Downstream Logical Inference Performance?
Viaarxiv icon

Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning

Add code
Oct 01, 2024
Figure 1 for Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Figure 2 for Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Figure 3 for Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Figure 4 for Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning
Viaarxiv icon

Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks

Add code
Jun 04, 2024
Viaarxiv icon

On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons

Add code
Apr 03, 2024
Figure 1 for On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Figure 2 for On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Figure 3 for On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Figure 4 for On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Viaarxiv icon

Interpreting Grokked Transformers in Complex Modular Arithmetic

Add code
Feb 27, 2024
Viaarxiv icon

Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text

Add code
Nov 30, 2023
Viaarxiv icon

Grokking Tickets: Lottery Tickets Accelerate Grokking

Add code
Oct 30, 2023
Figure 1 for Grokking Tickets: Lottery Tickets Accelerate Grokking
Figure 2 for Grokking Tickets: Lottery Tickets Accelerate Grokking
Figure 3 for Grokking Tickets: Lottery Tickets Accelerate Grokking
Figure 4 for Grokking Tickets: Lottery Tickets Accelerate Grokking
Viaarxiv icon

Open X-Embodiment: Robotic Learning Datasets and RT-X Models

Add code
Oct 17, 2023
Figure 1 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 2 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 3 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Figure 4 for Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Viaarxiv icon

Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4

Add code
Oct 06, 2023
Viaarxiv icon