Mathematics Department, University of Hertfordshire, UK
Abstract:We consider the application of machine learning to the evaluation of geothermal resource potential. A supervised learning problem is defined where maps of 10 geological and geophysical features within the state of Nevada, USA are used to define geothermal potential across a broad region. We have available a relatively small set of positive training sites (known resources or active power plants) and negative training sites (known drill sites with unsuitable geothermal conditions) and use these to constrain and optimize artificial neural networks for this classification task. The main objective is to predict the geothermal resource potential at unknown sites within a large geographic area where the defining features are known. These predictions could be used to target promising areas for further detailed investigations. We describe the evolution of our work from defining a specific neural network architecture to training and optimization trials. Upon analysis we expose the inevitable problems of model variability and resulting prediction uncertainty. Finally, to address these problems we apply the concept of Bayesian neural networks, a heuristic approach to regularization in network training, and make use of the practical interpretation of the formal uncertainty measures they provide.
Abstract:The text-critical practice of grouping witnesses into families or texttypes often faces two obstacles: Contamination in the manuscript tradition, and co-dependence in identifying characteristic readings and manuscripts. We introduce non-negative matrix factorization (NMF) as a simple, unsupervised, and efficient way to cluster large numbers of manuscripts and readings simultaneously while summarizing contamination using an easy-to-interpret mixture model. We apply this method to an extensive collation of the New Testament epistle of Jude and show that the resulting clusters correspond to human-identified textual families from existing research.
Abstract:This paper introduces an objective metric for evaluating a parsing scheme. It is based on Shannon's original work with letter sequences, which can be extended to part-of-speech tag sequences. It is shown that this regular language is an inadequate model for natural language, but a representation is used that models language slightly higher in the Chomsky hierarchy. We show how the entropy of parsed and unparsed sentences can be measured. If the entropy of the parsed sentence is lower, this indicates that some of the structure of the language has been captured. We apply this entropy indicator to support one particular parsing scheme that effects a top down segmentation. This approach could be used to decompose the parsing task into computationally more tractable subtasks. It also lends itself to the extraction of predicate/argument structure.