Abstract:Aviation safety is a global concern, requiring detailed investigations into incidents to understand contributing factors comprehensively. This study uses the National Transportation Safety Board (NTSB) dataset. It applies advanced natural language processing (NLP) techniques, including Latent Dirichlet Allocation (LDA), Non-Negative Matrix Factorization (NMF), Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (pLSA), and K-means clustering. The main objectives are identifying latent themes, exploring semantic relationships, assessing probabilistic connections, and cluster incidents based on shared characteristics. This research contributes to aviation safety by providing insights into incident narratives and demonstrating the versatility of NLP and topic modelling techniques in extracting valuable information from complex datasets. The results, including topics identified from various techniques, provide an understanding of recurring themes. Comparative analysis reveals that LDA performed best with a coherence value of 0.597, pLSA of 0.583, LSA of 0.542, and NMF of 0.437. K-means clustering further reveals commonalities and unique insights into incident narratives. In conclusion, this study uncovers latent patterns and thematic structures within incident narratives, offering a comparative analysis of multiple-topic modelling techniques. Future research avenues include exploring temporal patterns, incorporating additional datasets, and developing predictive models for early identification of safety issues. This research lays the groundwork for enhancing the understanding and improvement of aviation safety by utilising the wealth of information embedded in incident narratives.
Abstract:Improvements in aviation safety analysis call for innovative techniques to extract valuable insights from the abundance of textual data available in accident reports. This paper explores the application of four prominent topic modelling techniques, namely Probabilistic Latent Semantic Analysis (pLSA), Latent Semantic Analysis (LSA), Latent Dirichlet Allocation (LDA), and Non-negative Matrix Factorization (NMF), to dissect aviation incident narratives using the Australian Transport Safety Bureau (ATSB) dataset. The study examines each technique's ability to unveil latent thematic structures within the data, providing safety professionals with a systematic approach to gain actionable insights. Through a comparative analysis, this research not only showcases the potential of these methods in aviation safety but also elucidates their distinct advantages and limitations.