Abstract:Detecting Type 2 Diabetes (T2D) and Prediabetes (PD) is a real challenge for medicine due to the absence of pathogenic symptoms and the lack of known associated risk factors. Even though some proposals for machine learning models enable the identification of people at risk, the nature of the condition makes it so that a model suitable for one population may not necessarily be suitable for another. In this article, the development and assessment of predictive models to identify people at risk for T2D and PD specifically in Argentina are discussed. First, the database was thoroughly preprocessed and three specific datasets were generated considering a compromise between the number of records and the amount of available variables. After applying 5 different classification models, the results obtained show that a very good performance was observed for two datasets with some of these models. In particular, RF, DT, and ANN demonstrated great classification power, with good values for the metrics under consideration. Given the lack of this type of tool in Argentina, this work represents the first step towards the development of more sophisticated models.