Picture for Sho Takase

Sho Takase

Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models

Add code
Jun 29, 2024
Viaarxiv icon

Large Vocabulary Size Improves Large Language Models

Add code
Jun 24, 2024
Viaarxiv icon

Spike No More: Stabilizing the Pre-training of Large Language Models

Add code
Dec 28, 2023
Figure 1 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 2 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 3 for Spike No More: Stabilizing the Pre-training of Large Language Models
Figure 4 for Spike No More: Stabilizing the Pre-training of Large Language Models
Viaarxiv icon

Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

Add code
May 29, 2023
Viaarxiv icon

Nearest Neighbor Non-autoregressive Text Generation

Add code
Aug 26, 2022
Figure 1 for Nearest Neighbor Non-autoregressive Text Generation
Figure 2 for Nearest Neighbor Non-autoregressive Text Generation
Figure 3 for Nearest Neighbor Non-autoregressive Text Generation
Figure 4 for Nearest Neighbor Non-autoregressive Text Generation
Viaarxiv icon

Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention

Add code
Jul 27, 2022
Figure 1 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 2 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 3 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Figure 4 for Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention
Viaarxiv icon

On Layer Normalizations and Residual Connections in Transformers

Add code
Jun 01, 2022
Figure 1 for On Layer Normalizations and Residual Connections in Transformers
Figure 2 for On Layer Normalizations and Residual Connections in Transformers
Figure 3 for On Layer Normalizations and Residual Connections in Transformers
Figure 4 for On Layer Normalizations and Residual Connections in Transformers
Viaarxiv icon

Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation

Add code
Mar 25, 2022
Figure 1 for Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Figure 2 for Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Figure 3 for Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Figure 4 for Single Model Ensemble for Subword Regularized Models in Low-Resource Machine Translation
Viaarxiv icon

Interpretability for Language Learners Using Example-Based Grammatical Error Correction

Add code
Mar 14, 2022
Figure 1 for Interpretability for Language Learners Using Example-Based Grammatical Error Correction
Figure 2 for Interpretability for Language Learners Using Example-Based Grammatical Error Correction
Figure 3 for Interpretability for Language Learners Using Example-Based Grammatical Error Correction
Figure 4 for Interpretability for Language Learners Using Example-Based Grammatical Error Correction
Viaarxiv icon

ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization

Add code
Jan 14, 2022
Figure 1 for ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Figure 2 for ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Figure 3 for ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Figure 4 for ExtraPhrase: Efficient Data Augmentation for Abstractive Summarization
Viaarxiv icon