Abstract:In 2015, the United Nations put forward 17 Sustainable Development Goals (SDGs) to be achieved by 2030, where data has been promoted as a focus to innovating sustainable development and as a means to measuring progress towards achieving the SDGs. In this study, we propose a systematic approach towards discovering data types and sources that can be used for SDG research. The proposed method integrates a systematic mapping approach using manual qualitative coding over a corpus of SDG-related research literature followed by an automated process that applies rules to perform data entity extraction computationally. This approach is exemplified by an analysis of literature relating to SDG 7, the results of which are also presented in this paper. The paper concludes with a discussion of the approach and suggests future work to extend the method with more advance NLP and machine learning techniques.