Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tsuguo Mogami

Preferred Elements, Inc.

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Oct 10, 2024

Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai(+9 more)

Figure 1 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 2 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 3 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 4 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.

Via

Access Paper or Ask Questions

Deep Neural Network Training without Multiplications

Dec 07, 2020

Tsuguo Mogami

Figure 1 for Deep Neural Network Training without Multiplications

Figure 2 for Deep Neural Network Training without Multiplications

Abstract:Is multiplication really necessary for deep neural networks? Here we propose just adding two IEEE754 floating-point numbers with an integer-add instruction in place of a floating-point multiplication instruction. We show that ResNet can be trained using this operation with competitive classification accuracy. Our proposal did not require any methods to solve instability and decrease in accuracy, which is common in low-precision training. In some settings, we may obtain equal accuracy to the baseline FP32 result. This method will enable eliminating the multiplications in deep neural-network training and inference.

* 6 pages, 1 figure, accepted as a workshop paper at NeurIPS2020

Via

Access Paper or Ask Questions