Preferred Elements, Inc.
Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.
Abstract:Is multiplication really necessary for deep neural networks? Here we propose just adding two IEEE754 floating-point numbers with an integer-add instruction in place of a floating-point multiplication instruction. We show that ResNet can be trained using this operation with competitive classification accuracy. Our proposal did not require any methods to solve instability and decrease in accuracy, which is common in low-precision training. In some settings, we may obtain equal accuracy to the baseline FP32 result. This method will enable eliminating the multiplications in deep neural-network training and inference.