Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shunta Saito

Joint Search of Data Augmentation Policies and Network Architectures

Jan 12, 2021

Taiga Kashima, Yoshihiro Yamada, Shunta Saito

Figure 1 for Joint Search of Data Augmentation Policies and Network Architectures

Figure 2 for Joint Search of Data Augmentation Policies and Network Architectures

Figure 3 for Joint Search of Data Augmentation Policies and Network Architectures

Abstract:The common pipeline of training deep neural networks consists of several building blocks such as data augmentation and network architecture selection. AutoML is a research field that aims at automatically designing those parts, but most methods explore each part independently because it is more challenging to simultaneously search all the parts. In this paper, we propose a joint optimization method for data augmentation policies and network architectures to bring more automation to the design of training pipeline. The core idea of our approach is to make the whole part differentiable. The proposed method combines differentiable methods for augmentation policy search and network architecture search to jointly optimize them in the end-to-end manner. The experimental results show our method achieves competitive or superior performance to the independently searched results.

* AAAI 2021 Workshop: Learning Network Architecture during Training

Via

Access Paper or Ask Questions

Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Oct 05, 2020

He Huang, Shunta Saito, Yuta Kikuchi, Eiichi Matsumoto, Wei Tang, Philip S. Yu

Figure 1 for Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Figure 2 for Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Figure 3 for Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Figure 4 for Addressing Class Imbalance in Scene Graph Parsing by Learning to Contrast and Score

Abstract:Scene graph parsing aims to detect objects in an image scene and recognize their relations. Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations, as the highly long-tailed distribution of data biases the learning towards frequent labels. Motivated by the fact that detecting these rare relations can be critical in real-world applications, this paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem in scene graph parsing. Specifically, we design a new Contrasting Cross-Entropy loss, which promotes the detection of rare relations by suppressing incorrect frequent ones. Furthermore, we propose a novel scoring module, termed as Scorer, which learns to rank the relations based on the image features and relation features to improve the recall of predictions. Our framework is simple and effective, and can be incorporated into current scene graph models. Experimental results show that the proposed approach improves the current state-of-the-art methods, with a clear advantage of detecting rare relations.

* ACCV 2020

Via

Access Paper or Ask Questions

Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Aug 01, 2019

Seiya Tokui, Ryosuke Okuta, Takuya Akiba, Yusuke Niitani, Toru Ogawa, Shunta Saito, Shuji Suzuki, Kota Uenishi, Brian Vogel, Hiroyuki Yamazaki Vincent

Figure 1 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 2 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 3 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Figure 4 for Chainer: A Deep Learning Framework for Accelerating the Research Cycle

Abstract:Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through Define-by-Run, and also provides add-on packages for state-of-the-art computer vision models as well as distributed training.

* Accepted for Applied Data Science Track in KDD'19

Via

Access Paper or Ask Questions

TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

Nov 22, 2018

Masaki Saito, Shunta Saito

Figure 1 for TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

Figure 2 for TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

Figure 3 for TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

Figure 4 for TGANv2: Efficient Training of Large Models for Video Generation with Multiple Subsampling Layers

Abstract:In this paper, we propose a novel method to efficiently train a Generative Adversarial Network (GAN) on high dimensional samples. The key idea is to introduce a differentiable subsampling layer which appropriately reduces the dimensionality of intermediate feature maps in the generator during training. In general, generators require large memory and computational costs in the latter stages of the network as the feature maps become larger, though the latter stages have relatively fewer parameters than the earlier stages. It makes training large models for video generation difficult due to the limited computational resource. We solve this problem by introducing a method that gradually reduces the dimensionality of feature maps in the generator with multiple subsampling layers. We also propose a network (Temporal GAN v2) with such layers and perform video generation experiments. As a consequence, our model trained on the UCF101 dataset at $192 \times 192$ pixels achieves an Inception Score (IS) of 24.34, which shows a significant improvement over the previous state-of-the-art score of 14.56.

* The code will be released soon

Via

Access Paper or Ask Questions

Minimizing Supervision for Free-space Segmentation

Apr 18, 2018

Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall

Figure 1 for Minimizing Supervision for Free-space Segmentation

Figure 2 for Minimizing Supervision for Free-space Segmentation

Figure 3 for Minimizing Supervision for Free-space Segmentation

Figure 4 for Minimizing Supervision for Free-space Segmentation

Abstract:Identifying "free-space," or safely driveable regions in the scene ahead, is a fundamental task for autonomous navigation. While this task can be addressed using semantic segmentation, the manual labor involved in creating pixelwise annotations to train the segmentation model is very costly. Although weakly supervised segmentation addresses this issue, most methods are not designed for free-space. In this paper, we observe that homogeneous texture and location are two key characteristics of free-space, and develop a novel, practical framework for free-space segmentation with minimal human supervision. Our experiments show that our framework performs better than other weakly supervised methods while using less supervision. Our work demonstrates the potential for performing free-space segmentation without tedious and costly manual annotation, which will be important for adapting autonomous driving systems to different types of vehicles and environments

* Accepted for CVPR workshop WAD2018

Via

Access Paper or Ask Questions

ChainerCV: a Library for Deep Learning in Computer Vision

Aug 28, 2017

Yusuke Niitani, Toru Ogawa, Shunta Saito, Masaki Saito

Figure 1 for ChainerCV: a Library for Deep Learning in Computer Vision

Figure 2 for ChainerCV: a Library for Deep Learning in Computer Vision

Figure 3 for ChainerCV: a Library for Deep Learning in Computer Vision

Figure 4 for ChainerCV: a Library for Deep Learning in Computer Vision

Abstract:Despite significant progress of deep learning in the field of computer vision, there has not been a software library that covers these methods in a unifying manner. We introduce ChainerCV, a software library that is intended to fill this gap. ChainerCV supports numerous neural network models as well as software components needed to conduct research in computer vision. These implementations emphasize simplicity, flexibility and good software engineering practices. The library is designed to perform on par with the results reported in published papers and its tools can be used as a baseline for future research in computer vision. Our implementation includes sophisticated models like Faster R-CNN and SSD, and covers tasks such as object detection and semantic segmentation.

* Accepted to ACM MM 2017 Open Source Software Competition

Via

Access Paper or Ask Questions

Distantly Supervised Road Segmentation

Aug 21, 2017

Satoshi Tsutsui, Tommi Kerola, Shunta Saito

Figure 1 for Distantly Supervised Road Segmentation

Figure 2 for Distantly Supervised Road Segmentation

Figure 3 for Distantly Supervised Road Segmentation

Figure 4 for Distantly Supervised Road Segmentation

Abstract:We present an approach for road segmentation that only requires image-level annotations at training time. We leverage distant supervision, which allows us to train our model using images that are different from the target domain. Using large publicly available image databases as distant supervisors, we develop a simple method to automatically generate weak pixel-wise road masks. These are used to iteratively train a fully convolutional neural network, which produces our final segmentation model. We evaluate our method on the Cityscapes dataset, where we compare it with a fully supervised approach. Further, we discuss the trade-off between annotation cost and performance. Overall, our distantly supervised approach achieves 93.8% of the performance of the fully supervised approach, while using orders of magnitude less annotation work.

* Accepted for ICCV workshop CVRSUAD2017

Via

Access Paper or Ask Questions

Temporal Generative Adversarial Nets with Singular Value Clipping

Aug 18, 2017

Masaki Saito, Eiichi Matsumoto, Shunta Saito

Figure 1 for Temporal Generative Adversarial Nets with Singular Value Clipping

Figure 2 for Temporal Generative Adversarial Nets with Singular Value Clipping

Figure 3 for Temporal Generative Adversarial Nets with Singular Value Clipping

Figure 4 for Temporal Generative Adversarial Nets with Singular Value Clipping

Abstract:In this paper, we propose a generative model, Temporal Generative Adversarial Nets (TGAN), which can learn a semantic representation of unlabeled videos, and is capable of generating videos. Unlike existing Generative Adversarial Nets (GAN)-based methods that generate videos with a single generator consisting of 3D deconvolutional layers, our model exploits two different types of generators: a temporal generator and an image generator. The temporal generator takes a single latent variable as input and outputs a set of latent variables, each of which corresponds to an image frame in a video. The image generator transforms a set of such latent variables into a video. To deal with instability in training of GAN with such advanced networks, we adopt a recently proposed model, Wasserstein GAN, and propose a novel method to train it stably in an end-to-end manner. The experimental results demonstrate the effectiveness of our methods.

* to appear in ICCV 2017

Via

Access Paper or Ask Questions