Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shuji Suzuki

Preferred Elements, Inc.

A Judge-free LLM Open-ended Generation Benchmark Based on the Distributional Hypothesis

Feb 13, 2025

Kentaro Imajo, Masanori Hirano, Shuji Suzuki, Hiroaki Mikami

Abstract:Evaluating the open-ended text generation of large language models (LLMs) is challenging because of the lack of a clear ground truth and the high cost of human or LLM-based assessments. We propose a novel benchmark that evaluates LLMs using n-gram statistics and rules, without relying on human judgement or LLM-as-a-judge approaches. Using 50 question and reference answer sets, we introduce three new metrics based on n-grams and rules: Fluency, Truthfulness, and Helpfulness. Our benchmark strongly correlates with GPT-4o-based evaluations while requiring significantly fewer computational resources, demonstrating its effectiveness as a scalable alternative for assessing LLMs' open-ended generation capabilities.

* 13 pages

Via

Access Paper or Ask Questions

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Oct 10, 2024

Kenshin Abe, Kaizaburo Chubachi, Yasuhiro Fujita, Yuta Hirokawa, Kentaro Imajo, Toshiki Kataoka, Hiroyoshi Komatsu, Hiroaki Mikami, Tsuguo Mogami, Shogo Murai(+9 more)

Figure 1 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 2 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 3 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Figure 4 for PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

Abstract:We introduce PLaMo-100B, a large-scale language model designed for Japanese proficiency. The model was trained from scratch using 2 trillion tokens, with architecture such as QK Normalization and Z-Loss to ensure training stability during the training process. Post-training techniques, including Supervised Fine-Tuning and Direct Preference Optimization, were applied to refine the model's performance. Benchmark evaluations suggest that PLaMo-100B performs well, particularly in Japanese-specific tasks, achieving results that are competitive with frontier models like GPT-4.

Via

Access Paper or Ask Questions

A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training

Aug 25, 2021

Hiroaki Mikami, Kenji Fukumizu, Shogo Murai, Shuji Suzuki, Yuta Kikuchi, Taiji Suzuki, Shin-ichi Maeda, Kohei Hayashi

Figure 1 for A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training

Figure 2 for A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training

Figure 3 for A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training

Figure 4 for A Scaling Law for Synthetic-to-Real Transfer: A Measure of Pre-Training

Abstract:Synthetic-to-real transfer learning is a framework in which we pre-train models with synthetically generated images and ground-truth annotations for real tasks. Although synthetic images overcome the data scarcity issue, it remains unclear how the fine-tuning performance scales with pre-trained models, especially in terms of pre-training data size. In this study, we collect a number of empirical observations and uncover the secret. Through experiments, we observe a simple and general scaling law that consistently describes learning curves in various tasks, models, and complexities of synthesized pre-training data. Further, we develop a theory of transfer learning for a simplified scenario and confirm that the derived generalization bound is consistent with our empirical findings.

Via

Access Paper or Ask Questions

An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

May 11, 2020

Yuta Tokuoka, Shuji Suzuki, Yohei Sugawara

Figure 1 for An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

Figure 2 for An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

Figure 3 for An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

Figure 4 for An Inductive Transfer Learning Approach using Cycle-consistent Adversarial Domain Adaptation with Application to Brain Tumor Segmentation

Abstract:With recent advances in supervised machine learning for medical image analysis applications, the annotated medical image datasets of various domains are being shared extensively. Given that the annotation labelling requires medical expertise, such labels should be applied to as many learning tasks as possible. However, the multi-modal nature of each annotated image renders it difficult to share the annotation label among diverse tasks. In this work, we provide an inductive transfer learning (ITL) approach to adopt the annotation label of the source domain datasets to tasks of the target domain datasets using Cycle-GAN based unsupervised domain adaptation (UDA). To evaluate the applicability of the ITL approach, we adopted the brain tissue annotation label on the source domain dataset of Magnetic Resonance Imaging (MRI) images to the task of brain tumor segmentation on the target domain dataset of MRI. The results confirm that the segmentation accuracy of brain tumor segmentation improved significantly. The proposed ITL approach can make significant contribution to the field of medical image analysis, as we develop a fundamental tool to improve and promote various tasks using medical images.

* Proceedings of the 2019 6th International Conference on Biomedical and Bioinformatics Engineering, November 2019, Pages 44-48

Via

Access Paper or Ask Questions

Team PFDet's Methods for Open Images Challenge 2019

Oct 25, 2019

Yusuke Niitani, Toru Ogawa, Shuji Suzuki, Takuya Akiba, Tommi Kerola, Kohei Ozaki, Shotaro Sano

Figure 1 for Team PFDet's Methods for Open Images Challenge 2019

Figure 2 for Team PFDet's Methods for Open Images Challenge 2019

Figure 3 for Team PFDet's Methods for Open Images Challenge 2019

Figure 4 for Team PFDet's Methods for Open Images Challenge 2019

Abstract:We present the instance segmentation and the object detection method used by team PFDet for Open Images Challenge 2019. We tackle a massive dataset size, huge class imbalance and federated annotations. Using this method, the team PFDet achieved 3rd and 4th place in the instance segmentation and the object detection track, respectively.

Via

Access Paper or Ask Questions

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Aug 01, 2019

Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, Hiroyuki Yamazaki Vincent

Figure 1 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 2 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 3 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 4 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Abstract:Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.

* Accepted for Applied Data Science Track in KDD'19

Via

Access Paper or Ask Questions

Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

Nov 27, 2018

Yusuke Niitani, Takuya Akiba, Tommi Kerola, Toru Ogawa, Shotaro Sano, Shuji Suzuki

Figure 1 for Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

Figure 2 for Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

Figure 3 for Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

Abstract:Efficient and reliable methods for training of object detectors are in higher demand than ever, and more and more data relevant to the field is becoming available. However, large datasets like Open Images Dataset v4 (OID) are sparsely annotated, and some measure must be taken in order to ensure the training of a reliable detector. In order to take the incompleteness of these datasets into account, one possibility is to use pretrained models to detect the presence of the unverified objects. However, the performance of such a strategy depends largely on the power of the pretrained model. In this study, we propose part-aware sampling, a method that uses human intuition for the hierarchical relation between objects. In terse terms, our method works by making assumptions like "a bounding box for a car should contain a bounding box for a tire". We demonstrate the power of our method on OID and compare the performance against a method based on a pretrained model. Our method also won the first and second place on the public and private test sets of the Google AI Open Images Competition 2018.

Via

Access Paper or Ask Questions

PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Sep 04, 2018

Takuya Akiba, Tommi Kerola, Yusuke Niitani, Toru Ogawa, Shotaro Sano, Shuji Suzuki

Figure 1 for PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Figure 2 for PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Figure 3 for PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Figure 4 for PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track

Abstract:We present a large-scale object detection system by team PFDet. Our system enables training with huge datasets using 512 GPUs, handles sparsely verified classes, and massive class imbalance. Using our method, we achieved 2nd place in the Google AI Open Images Object Detection Track 2018 on Kaggle.

* Technical report for Open Images Challenge 2018 Object Detection Track

Via

Access Paper or Ask Questions

Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Nov 12, 2017

Takuya Akiba, Shuji Suzuki, Keisuke Fukuda

Figure 1 for Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Figure 2 for Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

Abstract:We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

* NIPS'17 Workshop: Deep Learning at Supercomputer Scale

Via

Access Paper or Ask Questions

ChainerMN: Scalable Distributed Deep Learning Framework

Oct 31, 2017

Takuya Akiba, Keisuke Fukuda, Shuji Suzuki

Figure 1 for ChainerMN: Scalable Distributed Deep Learning Framework

Figure 2 for ChainerMN: Scalable Distributed Deep Learning Framework

Abstract:One of the keys for deep learning to have made a breakthrough in various fields was to utilize high computing powers centering around GPUs. Enabling the use of further computing abilities by distributed processing is essential not only to make the deep learning bigger and faster but also to tackle unsolved challenges. We present the design, implementation, and evaluation of ChainerMN, the distributed deep learning framework we have developed. We demonstrate that ChainerMN can scale the learning process of the ResNet-50 model to the ImageNet dataset up to 128 GPUs with the parallel efficiency of 90%.

Via

Access Paper or Ask Questions