Abstract:News data have become an essential resource across various disciplines, including economics, finance, management, social sciences, and computer science. Researchers leverage newspaper articles to study economic trends, market dynamics, corporate strategies, public perception, political discourse, and the evolution of public opinion. Additionally, news datasets have been instrumental in training large-scale language models, with applications in sentiment analysis, fake news detection, and automated news summarization. Despite their significance, access to comprehensive news corpora remains a key challenge. Many full-text news providers, such as Factiva and LexisNexis, require costly subscriptions, while free alternatives often suffer from incomplete data and transparency issues. This paper presents a novel approach to obtaining full-text newspaper articles at near-zero cost by leveraging data from the Global Database of Events, Language, and Tone (GDELT). Specifically, we focus on the GDELT Web News NGrams 3.0 dataset, which provides high-frequency updates of n-grams extracted from global online news sources. We provide Python code to reconstruct full-text articles from these n-grams by identifying overlapping textual fragments and intelligently merging them. Our method enables researchers to access structured, large-scale newspaper data for text analysis while overcoming the limitations of existing proprietary datasets. The proposed approach enhances the accessibility of news data for empirical research, facilitating applications in economic forecasting, computational social science, and natural language processing.
Abstract:Various macroeconomic and institutional factors hinder FDI inflows, including corruption, trade openness, access to finance, and political instability. Existing research mostly focuses on country-level data, with limited exploration of firm-level data, especially in developing countries. Recognizing this gap, recent calls for research emphasize the need for qualitative data analysis to delve into FDI determinants, particularly at the regional level. This paper proposes a novel methodology, based on text mining and social network analysis, to get information from more than 167,000 online news articles to quantify regional-level (sub-national) attributes affecting FDI ownership in African companies. Our analysis extends information on obstacles to industrial development as mapped by the World Bank Enterprise Surveys. Findings suggest that regional (sub-national) structural and institutional characteristics can play an important role in determining foreign ownership.