Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Masaki Waga

SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches

Mar 05, 2025

Hiroyuki Deguchi, Go Kamoda, Yusuke Matsushita, Chihiro Taguchi, Kohei Suenaga, Masaki Waga, Sho Yokoi

Abstract:Researchers and practitioners in natural language processing and computational linguistics frequently observe and analyze the real language usage in large-scale corpora. For that purpose, they often employ off-the-shelf pattern-matching tools, such as grep, and keyword-in-context concordancers, which is widely used in corpus linguistics for gathering examples. Nonetheless, these existing techniques rely on surface-level string matching, and thus they suffer from the major limitation of not being able to handle orthographic variations and paraphrasing -- notable and common phenomena in any natural language. In addition, existing continuous approaches such as dense vector search tend to be overly coarse, often retrieving texts that are unrelated but share similar topics. Given these challenges, we propose a novel algorithm that achieves \emph{soft} (or semantic) yet efficient pattern matching by relaxing a surface-level matching with word embeddings. Our algorithm is highly scalable with respect to the size of the corpus text utilizing inverted indexes. We have prepared an efficient implementation, and we provide an accessible web tool. Our experiments demonstrate that the proposed method (i) can execute searches on billion-scale corpora in less than a second, which is comparable in speed to surface-level string matching and dense vector search; (ii) can extract harmful instances that semantically match queries from a large set of English and Japanese Wikipedia articles; and (iii) can be effectively applied to corpus-linguistic analyses of Latin, a language with highly diverse inflections.

* Accepted at ICLR2025

Via

Access Paper or Ask Questions

Learning Weighted Finite Automata over the Max-Plus Semiring and its Termination

Jul 13, 2024

Takamasa Okudono, Masaki Waga, Taro Sekiyama, Ichiro Hasuo

Figure 1 for Learning Weighted Finite Automata over the Max-Plus Semiring and its Termination

Abstract:Active learning of finite automata has been vigorously pursued for the purposes of analysis and explanation of black-box systems. In this paper, we study an L*-style learning algorithm for weighted automata over the max-plus semiring. The max-plus setting exposes a "consistency" issue in the previously studied semiring-generic extension of L*: we show that it can fail to maintain consistency of tables, and can thus make equivalence queries on obviously wrong hypothesis automata. We present a theoretical fix by a mathematically clean notion of column-closedness. We also present a nontrivial and reasonably broad class of weighted languages over the max-plus semiring in which our algorithm terminates.

Via

Access Paper or Ask Questions

Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety Distance

Mar 27, 2024

Jesse Reimann, Nico Mansion, James Haydon, Benjamin Bray, Agnishom Chattopadhyay, Sota Sato, Masaki Waga, Étienne André, Ichiro Hasuo, Naoki Ueda(+1 more)

Abstract:As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.

* 12 pages, 4 figures, 5 tables. Accepted to SAC 2024

Via

Access Paper or Ask Questions

BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models

Oct 31, 2022

Atsushi Kikuchi, Kotaro Uchida, Masaki Waga, Kohei Suenaga

Abstract:Explaining a classification result produced by an image- and video-classification model is one of the important but challenging issues in computer vision. Many methods have been proposed for producing heat-map--based explanations for this purpose, including ones based on the white-box approach that uses the internal information of a model (e.g., LRP, Grad-CAM, and Grad-CAM++) and ones based on the black-box approach that does not use any internal information (e.g., LIME, SHAP, and RISE). We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method. Our observation is that a heat-map--based explanation can be seen as a prior for an explanation method based on Bayesian optimization. Based on this observation, BOREx conducts Gaussian process regression (GPR) to estimate the saliency of each pixel in a given image starting from the one produced by another explanation method. Our experiments statistically demonstrate that the refinement by BOREx improves low-quality heat maps for image- and video-classification results.

* 32 pages. To appear in ACCV 2022

Via

Access Paper or Ask Questions

Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Jul 27, 2022

Masaki Waga, Ezequiel Castellano, Sasinee Pruekprasert, Stefan Klikovits, Toru Takisaka, Ichiro Hasuo

Figure 1 for Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Figure 2 for Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Figure 3 for Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Figure 4 for Dynamic Shielding for Reinforcement Learning in Black-Box Environments

Abstract:It is challenging to use reinforcement learning (RL) in cyber-physical systems due to the lack of safety guarantees during learning. Although there have been various proposals to reduce undesired behaviors during learning, most of these techniques require prior system knowledge, and their applicability is limited. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning. The dynamic shielding technique constructs an approximate system model in parallel with RL using a variant of the RPNI algorithm and suppresses undesired explorations due to the shield constructed from the learned model. Through this combination, potentially unsafe actions can be foreseen before the agent experiences them. Experiments show that our dynamic shield significantly decreases the number of undesired events during training.

* This is the author (and extended) version of the manuscript of the same name published in the proceedings of the 20th International Symposium on Automated Technology for Verification and Analysis (ATVA 2022)

Via

Access Paper or Ask Questions

Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

Apr 11, 2020

Elena Gutiérrez, Takamasa Okudono, Masaki Waga, Ichiro Hasuo

Figure 1 for Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

Figure 2 for Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

Figure 3 for Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

Figure 4 for Genetic Algorithm for the Weight Maximization Problem on Weighted Automata

Abstract:The weight maximization problem (WMP) is the problem of finding the word of highest weight on a weighted finite state automaton (WFA). It is an essential question that emerges in many optimization problems in automata theory. Unfortunately, the general problem can be shown to be undecidable, whereas its bounded decisional version is NP-complete. Designing efficient algorithms that produce approximate solutions to the WMP in reasonable time is an appealing research direction that can lead to several new applications including formal verification of systems abstracted as WFAs. In particular, in combination with a recent procedure that translates a recurrent neural network into a weighted automaton, an algorithm for the WMP can be used to analyze and verify the network by exploiting the simpler and more compact automata model. In this work, we propose, implement and evaluate a metaheuristic based on genetic algorithms to approximate solutions to the WMP. We experimentally evaluate its performance on examples from the literature and show its potential on different applications.

* Accepted at GECCO 2020

Via

Access Paper or Ask Questions

Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

Apr 08, 2019

Takamasa Okudono, Masaki Waga, Taro Sekiyama, Ichiro Hasuo

Figure 1 for Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

Figure 2 for Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

Figure 3 for Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

Figure 4 for Weighted Automata Extraction from Recurrent Neural Networks via Regression on State Spaces

Abstract:We present a method to extract a weighted finite automaton (WFA) from a recurrent neural network (RNN). Our algorithm is based on the WFA learning algorithm by Balle and Mohri, which is in turn an extension of Angluin's classic \lstar algorithm. Our technical novelty is in the use of \emph{regression} methods for the so-called equivalence queries, thus exploiting the internal state space of an RNN. This way we achieve a quantitative extension of the recent work by Weiss, Goldberg and Yahav that extracts DFAs. Experiments demonstrate that our algorithm's practicality.

* We are preparing to distribute the implementation

Via

Access Paper or Ask Questions