Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shoichiro Yamaguchi

When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars

Apr 24, 2025

Rei Higuchi, Ryotaro Kawata, Naoki Nishikawa, Kazusato Oko, Shoichiro Yamaguchi, Sosuke Kobayashi, Seiya Tokui, Kohei Hayashi, Daisuke Okanohara, Taiji Suzuki

Abstract:The ability to acquire latent semantics is one of the key properties that determines the performance of language models. One convenient approach to invoke this ability is to prepend metadata (e.g. URLs, domains, and styles) at the beginning of texts in the pre-training data, making it easier for the model to access latent semantics before observing the entire text. Previous studies have reported that this technique actually improves the performance of trained models in downstream tasks; however, this improvement has been observed only in specific downstream tasks, without consistent enhancement in average next-token prediction loss. To understand this phenomenon, we closely investigate how prepending metadata during pre-training affects model performance by examining its behavior using artificial data. Interestingly, we found that this approach produces both positive and negative effects on the downstream tasks. We demonstrate that the effectiveness of the approach depends on whether latent semantics can be inferred from the downstream task's prompt. Specifically, through investigations using data generated by probabilistic context-free grammars, we show that training with metadata helps improve model's performance when the given context is long enough to infer the latent semantics. In contrast, the technique negatively impacts performance when the context lacks the necessary information to make an accurate posterior inference.

Via

Access Paper or Ask Questions

Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Jun 19, 2023

Kenta Oono, Nontawat Charoenphakdee, Kotatsu Bito, Zhengyan Gao, Yoshiaki Ota, Shoichiro Yamaguchi, Yohei Sugawara, Shin-ichi Maeda, Kunihiko Miyoshi, Yuki Saito(+3 more)

Figure 1 for Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Figure 2 for Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Figure 3 for Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Figure 4 for Virtual Human Generative Model: Masked Modeling Approach for Learning Human Characteristics

Abstract:Identifying the relationship between healthcare attributes, lifestyles, and personality is vital for understanding and improving physical and mental conditions. Machine learning approaches are promising for modeling their relationships and offering actionable suggestions. In this paper, we propose Virtual Human Generative Model (VHGM), a machine learning model for estimating attributes about healthcare, lifestyles, and personalities. VHGM is a deep generative model trained with masked modeling to learn the joint distribution of attributes conditioned on known ones. Using heterogeneous tabular datasets, VHGM learns more than 1,800 attributes efficiently. We numerically evaluate the performance of VHGM and its training techniques. As a proof-of-concept of VHGM, we present several applications demonstrating user scenarios, such as virtual measurements of healthcare attributes and hypothesis verifications of lifestyles.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Feb 26, 2022

Tomoki Ando, Hiroto Iino, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Figure 1 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 2 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 3 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Figure 4 for Collision-free Path Planning on Arbitrary Optimization Criteria in the Latent Space through cGANs

Abstract:We propose a new method for collision-free path planning by Conditional Generative Adversarial Networks (cGANs) by mapping its latent space to only the collision-free areas of the robot joint space when an obstacle map is given as a condition. When manipulating a robot arm, it is necessary to generate a trajectory that avoids contact with the robot itself or the surrounding environment for safety reasons, and it is convenient to generate multiple arbitrary trajectories appropriate for respective purposes. In the proposed method, various trajectories to avoid obstacles can be generated by connecting the start and goal with arbitrary line segments in this latent space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated and actual UR5e 6-DoF robotic arm. We confirmed that different trajectories can be generated according to different optimization conditions.

* 8 pages, 6 figures. Submitted to RA-L (IEEE Robotics and Automation Letters) with IROS 2022 Option. An accompanying video is available at https://www.youtube.com/watch?v=bZTbWxLt6Bo. arXiv admin note: substantial text overlap with arXiv:2202.07203

Via

Access Paper or Ask Questions

Collision-free Path Planning in the Latent Space through cGANs

Feb 15, 2022

Tomoki Ando, Hiroki Mori, Ryota Torishima, Kuniyuki Takahashi, Shoichiro Yamaguchi, Daisuke Okanohara, Tetsuya Ogata

Figure 1 for Collision-free Path Planning in the Latent Space through cGANs

Figure 2 for Collision-free Path Planning in the Latent Space through cGANs

Figure 3 for Collision-free Path Planning in the Latent Space through cGANs

Figure 4 for Collision-free Path Planning in the Latent Space through cGANs

Abstract:We show a new method for collision-free path planning by cGANs by mapping its latent space to only the collision-free areas of the robot joint space. Our method simply provides this collision-free latent space after which any planner, using any optimization conditions, can be used to generate the most suitable paths on the fly. We successfully verified this method with a simulated two-link robot arm.

* 10pages, 9figures

Via

Access Paper or Ask Questions

Out-of-Distribution Generalization with Maximal Invariant Predictor

Aug 04, 2020

Masanori Koyama, Shoichiro Yamaguchi

Figure 1 for Out-of-Distribution Generalization with Maximal Invariant Predictor

Figure 2 for Out-of-Distribution Generalization with Maximal Invariant Predictor

Figure 3 for Out-of-Distribution Generalization with Maximal Invariant Predictor

Figure 4 for Out-of-Distribution Generalization with Maximal Invariant Predictor

Abstract:Out-of-Distribution (OOD) generalization problem is a problem of seeking the predictor function whose performance in the worst environments is optimal. This paper makes two contributions to OOD problem. We first use the basic results of probability to prove maximal Invariant Predictor(MIP) condition, a theoretical result that can be used to identify the OOD optimal solution. We then use our MIP to derive inner-environmental Gradient Alignment(IGA) algorithm that can be used to help seek the OOD optimal predictor. Previous studies that have investigated the theoretical aspect of the OOD-problem use strong structural assumptions such as causal DAG. However, in cases involving image datasets, for example, the identification of hidden structural relations is itself a difficult problem. Our theoretical results are different from those of many previous studies in that it can be applied to cases in which the underlying structure of a dataset is difficult to analyze. We present an extensive comparison of previous theoretical approaches to the OODproblems based on the assumptions they make. We also present an extension of the colored-MNIST that can more accurately represent the pathological OOD situation than the original version, and demonstrate the superiority of IGA over previous methods on both the original and the extended version of Colored-MNIST.

Via

Access Paper or Ask Questions

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Nov 19, 2019

Homanga Bharadhwaj, Shoichiro Yamaguchi, Shin-ichi Maeda

Figure 1 for MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Figure 2 for MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Figure 3 for MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Figure 4 for MANGA: Method Agnostic Neural-policy Generalization and Adaptation

Abstract:In this paper we target the problem of transferring policies across multiple environments with different dynamics parameters and motor noise variations, by introducing a framework that decouples the processes of policy learning and system identification. Efficiently transferring learned policies to an unknown environment with changes in dynamics configurations in the presence of motor noise is very important for operating robots in the real world, and our work is a novel attempt in that direction. We introduce MANGA: Method Agnostic Neural-policy Generalization and Adaptation, that trains dynamics conditioned policies and efficiently learns to estimate the dynamics parameters of the environment given off-policy state-transition rollouts in the environment. Our scheme is agnostic to the type of training method used - both reinforcement learning (RL) and imitation learning (IL) strategies can be used. We demonstrate the effectiveness of our approach by experimenting with four different MuJoCo agents and comparing against previously proposed transfer baselines.

* Under Review. Video available at https://drive.google.com/file/d/12GsDq3iQDXEutE-xpzXxqrEfD6dYhKjs/view?usp=sharing Other details will be made available in the author's webpage www.homangabharadhwaj.com

Via

Access Paper or Ask Questions

Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Oct 08, 2019

Kyo Kutsuzawa, Hitoshi Kusano, Ayaka Kume, Shoichiro Yamaguchi

Figure 1 for Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Figure 2 for Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Figure 3 for Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Figure 4 for Motion Generation Considering Situation with Conditional Generative Adversarial Networks for Throwing Robots

Abstract:When robots work in a cluttered environment, the constraints for motions change frequently and the required action can change even for the same task. However, planning complex motions from direct calculation has the risk of resulting in poor performance local optima. In addition, machine learning approaches often require relearning for novel situations. In this paper, we propose a method of searching appropriate motions by using conditional Generative Adversarial Networks (cGANs), which can generate motions based on the conditions by mimicking training datasets. By training cGANs with various motions for a task, its latent space is fulfilled with the valid motions for the task. The appropriate motions can be found efficiently by searching the latent space of the trained cGANs instead of the motion space, while avoiding poor local optima. We demonstrate that the proposed method successfully works for an object-throwing task to given target positions in both numerical simulation and real-robot experiments. The proposed method resulted in three times higher accuracy with 2.5 times faster calculation time than searching the action space directly.

Via

Access Paper or Ask Questions

Data Interpolating Prediction: Alternative Interpretation of Mixup

Jun 20, 2019

Takuya Shimada, Shoichiro Yamaguchi, Kohei Hayashi, Sosuke Kobayashi

Figure 1 for Data Interpolating Prediction: Alternative Interpretation of Mixup

Figure 2 for Data Interpolating Prediction: Alternative Interpretation of Mixup

Figure 3 for Data Interpolating Prediction: Alternative Interpretation of Mixup

Figure 4 for Data Interpolating Prediction: Alternative Interpretation of Mixup

Abstract:Data augmentation by mixing samples, such as Mixup, has widely been used typically for classification tasks. However, this strategy is not always effective due to the gap between augmented samples for training and original samples for testing. This gap may prevent a classifier from learning the optimal decision boundary and increase the generalization error. To overcome this problem, we propose an alternative framework called Data Interpolating Prediction (DIP). Unlike common data augmentations, we encapsulate the sample-mixing process in the hypothesis class of a classifier so that train and test samples are treated equally. We derive the generalization bound and show that DIP helps to reduce the original Rademacher complexity. Also, we empirically demonstrate that DIP can outperform existing Mixup.

* Presented at the 2nd Learning from Limited Labeled Data (LLD) Workshop at ICLR 2019

Via

Access Paper or Ask Questions

Semi-flat minima and saddle points by embedding neural networks to overparameterization

Jun 14, 2019

Kenji Fukumizu, Shoichiro Yamaguchi, Yoh-ichi Mototake, Mirai Tanaka

Figure 1 for Semi-flat minima and saddle points by embedding neural networks to overparameterization

Figure 2 for Semi-flat minima and saddle points by embedding neural networks to overparameterization

Figure 3 for Semi-flat minima and saddle points by embedding neural networks to overparameterization

Figure 4 for Semi-flat minima and saddle points by embedding neural networks to overparameterization

Abstract:We theoretically study the landscape of the training error for neural networks in overparameterized cases. We consider three basic methods for embedding a network into a wider one with more hidden units, and discuss whether a minimum point of the narrower network gives a minimum or saddle point of the wider one. Our results show that the networks with smooth and ReLU activation have different partially flat landscapes around the embedded point. We also relate these results to a difference of their generalization abilities in overparameterized realization.

* 38 pages, 4 figures

Via

Access Paper or Ask Questions

A Differentiable Gaussian-like Distribution on Hyperbolic Space for Gradient-Based Learning

Feb 08, 2019

Yoshihiro Nagano, Shoichiro Yamaguchi, Yasuhiro Fujita, Masanori Koyama

Figure 1 for A Differentiable Gaussian-like Distribution on Hyperbolic Space for Gradient-Based Learning

Figure 2 for A Differentiable Gaussian-like Distribution on Hyperbolic Space for Gradient-Based Learning

Figure 3 for A Differentiable Gaussian-like Distribution on Hyperbolic Space for Gradient-Based Learning

Abstract:Hyperbolic space is a geometry that is known to be well-suited for representation learning of data with an underlying hierarchical structure. In this paper, we present a novel hyperbolic distribution called \textit{pseudo-hyperbolic Gaussian}, a Gaussian-like distribution on hyperbolic space whose density can be evaluated analytically and differentiated with respect to the parameters. Our distribution enables the gradient-based learning of the probabilistic models on hyperbolic space that could never have been considered before. Also, we can sample from this hyperbolic probability distribution without resorting to auxiliary means like rejection sampling. As applications of our distribution, we develop a hyperbolic-analog of variational autoencoder and a method of probabilistic word embedding on hyperbolic space. We demonstrate the efficacy of our distribution on various datasets including MNIST, Atari 2600 Breakout, and WordNet.

* 17 pages, 12 figures

Via

Access Paper or Ask Questions