Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Irving Gómez-Méndez

Geological Inference from Textual Data using Word Embeddings

Apr 10, 2025

Nanmanas Linphrachaya, Irving Gómez-Méndez, Adil Siripatana

Abstract:This research explores the use of Natural Language Processing (NLP) techniques to locate geological resources, with a specific focus on industrial minerals. By using word embeddings trained with the GloVe model, we extract semantic relationships between target keywords and a corpus of geological texts. The text is filtered to retain only words with geographical significance, such as city names, which are then ranked by their cosine similarity to the target keyword. Dimensional reduction techniques, including Principal Component Analysis (PCA), Autoencoder, Variational Autoencoder (VAE), and VAE with Long Short-Term Memory (VAE-LSTM), are applied to enhance feature extraction and improve the accuracy of semantic relations. For benchmarking, we calculate the proximity between the ten cities most semantically related to the target keyword and identified mine locations using the haversine equation. The results demonstrate that combining NLP with dimensional reduction techniques provides meaningful insights into the spatial distribution of natural resources. Although the result shows to be in the same region as the supposed location, the accuracy has room for improvement.

Via

Access Paper or Ask Questions

Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Oct 18, 2021

Irving Gómez-Méndez, Emilien Joly

Figure 1 for Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Figure 2 for Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Figure 3 for Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Figure 4 for Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Abstract:In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper Gomez-Mendez and Joly (2020) on the consistency ofthis new algorithm.

Via

Access Paper or Ask Questions