Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ryo Hayamizu

MoireDB: Formula-generated Interference-fringe Image Dataset

Feb 03, 2025

Yuto Matsuo, Ryo Hayamizu, Hirokatsu Kataoka, Akio Nakamura

Abstract:Image recognition models have struggled to treat recognition robustness to real-world degradations. In this context, data augmentation methods like PixMix improve robustness but rely on generative arts and feature visualizations (FVis), which have copyright, drawing cost, and scalability issues. We propose MoireDB, a formula-generated interference-fringe image dataset for image augmentation enhancing robustness. MoireDB eliminates copyright concerns, reduces dataset assembly costs, and enhances robustness by leveraging illusory patterns. Experiments show that MoireDB augmented images outperforms traditional Fractal arts and FVis-based augmentations, making it a scalable and effective solution for improving model robustness against real-world degradations.

Via

Access Paper or Ask Questions

SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

Sep 29, 2023

Risa Shinoda, Ryo Hayamizu, Kodai Nakashima, Nakamasa Inoue, Rio Yokota, Hirokatsu Kataoka

Figure 1 for SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

Figure 2 for SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

Figure 3 for SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

Figure 4 for SegRCDB: Semantic Segmentation via Formula-Driven Supervised Learning

Abstract:Pre-training is a strong strategy for enhancing visual models to efficiently train them with a limited number of labeled images. In semantic segmentation, creating annotation masks requires an intensive amount of labor and time, and therefore, a large-scale pre-training dataset with semantic labels is quite difficult to construct. Moreover, what matters in semantic segmentation pre-training has not been fully investigated. In this paper, we propose the Segmentation Radial Contour DataBase (SegRCDB), which for the first time applies formula-driven supervised learning for semantic segmentation. SegRCDB enables pre-training for semantic segmentation without real images or any manual semantic labels. SegRCDB is based on insights about what is important in pre-training for semantic segmentation and allows efficient pre-training. Pre-training with SegRCDB achieved higher mIoU than the pre-training with COCO-Stuff for fine-tuning on ADE-20k and Cityscapes with the same number of training images. SegRCDB has a high potential to contribute to semantic segmentation pre-training and investigation by enabling the creation of large datasets without manual annotation. The SegRCDB dataset will be released under a license that allows research and commercial use. Code is available at: https://github.com/dahlian00/SegRCDB

* ICCV2023. Code: https://github.com/dahlian00/SegRCDB, Project page: https://dahlian00.github.io/SegRCDBPage/

Via

Access Paper or Ask Questions

Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Mar 02, 2023

Sora Takashima, Ryo Hayamizu, Nakamasa Inoue, Hirokatsu Kataoka, Rio Yokota

Figure 1 for Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Figure 2 for Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Figure 3 for Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Figure 4 for Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

Abstract:Formula-driven supervised learning (FDSL) has been shown to be an effective method for pre-training vision transformers, where ExFractalDB-21k was shown to exceed the pre-training effect of ImageNet-21k. These studies also indicate that contours mattered more than textures when pre-training vision transformers. However, the lack of a systematic investigation as to why these contour-oriented synthetic datasets can achieve the same accuracy as real datasets leaves much room for skepticism. In the present work, we develop a novel methodology based on circular harmonics for systematically investigating the design space of contour-oriented synthetic datasets. This allows us to efficiently search the optimal range of FDSL parameters and maximize the variety of synthetic images in the dataset, which we found to be a critical factor. When the resulting new dataset VisualAtom-21k is used for pre-training ViT-Base, the top-1 accuracy reached 83.7% when fine-tuning on ImageNet-1k. This is close to the top-1 accuracy (84.2%) achieved by JFT-300M pre-training, while the number of images is 1/14. Unlike JFT-300M which is a static dataset, the quality of synthetic datasets will continue to improve, and the current work is a testament to this possibility. FDSL is also free of the common issues associated with real images, e.g. privacy/copyright issues, labeling costs/errors, and ethical biases.

* Accepted to CVPR 2023

Via

Access Paper or Ask Questions

Replacing Labeled Real-image Datasets with Auto-generated Contours

Jun 18, 2022

Hirokatsu Kataoka, Ryo Hayamizu, Ryosuke Yamada, Kodai Nakashima, Sora Takashima, Xinyu Zhang, Edgar Josafat Martinez-Noriega, Nakamasa Inoue, Rio Yokota

Figure 1 for Replacing Labeled Real-image Datasets with Auto-generated Contours

Figure 2 for Replacing Labeled Real-image Datasets with Auto-generated Contours

Figure 3 for Replacing Labeled Real-image Datasets with Auto-generated Contours

Figure 4 for Replacing Labeled Real-image Datasets with Auto-generated Contours

Abstract:In the present work, we show that the performance of formula-driven supervised learning (FDSL) can match or even exceed that of ImageNet-21k without the use of real images, human-, and self-supervision during the pre-training of Vision Transformers (ViTs). For example, ViT-Base pre-trained on ImageNet-21k shows 81.8% top-1 accuracy when fine-tuned on ImageNet-1k and FDSL shows 82.7% top-1 accuracy when pre-trained under the same conditions (number of images, hyperparameters, and number of epochs). Images generated by formulas avoid the privacy/copyright issues, labeling cost and errors, and biases that real images suffer from, and thus have tremendous potential for pre-training general models. To understand the performance of the synthetic images, we tested two hypotheses, namely (i) object contours are what matter in FDSL datasets and (ii) increased number of parameters to create labels affects performance improvement in FDSL pre-training. To test the former hypothesis, we constructed a dataset that consisted of simple object contour combinations. We found that this dataset can match the performance of fractals. For the latter hypothesis, we found that increasing the difficulty of the pre-training task generally leads to better fine-tuning accuracy.

* Accepted to CVPR 2022

Via

Access Paper or Ask Questions