Abstract:Current Continual Knowledge Graph Embedding (CKGE) methods primarily rely on translation-based embedding methods, leveraging previously acquired knowledge to initialize new facts. To enhance learning efficiency, these methods often integrate fine-tuning or continual learning strategies. However, this compromises the model's prediction accuracy and the translation-based methods lack support for complex relational structures (multi-hop relations). To tackle this challenge, we propose a novel CKGE framework SoTCKGE grounded in Spatial Offset Transformation. Within this framework, entity positions are defined as being jointly determined by base position vectors and offset vectors. This not only enhances the model's ability to represent complex relational structures but also allows for the embedding update of both new and old knowledge through simple spatial offset transformations, without the need for continuous learning methods. Furthermore, we introduce a hierarchical update strategy and a balanced embedding method to refine the parameter update process, effectively minimizing training costs and augmenting model accuracy. To comprehensively assess the performance of our model, we have conducted extensive experimlents on four publicly accessible datasets and a new dataset constructed by us. Experimental results demonstrate the advantage of our model in enhancing multi-hop relationship learning and further improving prediction accuracy.
Abstract:Misinformation can seriously impact society, affecting anything from public opinion to institutional confidence and the political horizon of a state. Fake News (FN) proliferation on online websites and Online Social Networks (OSNs) has increased profusely. Various fact-checking websites include news in English and barely provide information about FN in regional languages. Thus the Urdu FN purveyors cannot be discerned using factchecking portals. SOTA approaches for Fake News Detection (FND) count upon appropriately labelled and large datasets. FND in regional and resource-constrained languages lags due to the lack of limited-sized datasets and legitimate lexical resources. The previous datasets for Urdu FND are limited-sized, domain-restricted, publicly unavailable and not manually verified where the news is translated from English into Urdu. In this paper, we curate and contribute the first largest publicly available dataset for Urdu FND, Ax-to-Grind Urdu, to bridge the identified gaps and limitations of existing Urdu datasets in the literature. It constitutes 10,083 fake and real news on fifteen domains collected from leading and authentic Urdu newspapers and news channel websites in Pakistan and India. FN for the Ax-to-Grind dataset is collected from websites and crowdsourcing. The dataset contains news items in Urdu from the year 2017 to the year 2023. Expert journalists annotated the dataset. We benchmark the dataset with an ensemble model of mBERT,XLNet, and XLM RoBERTa. The selected models are originally trained on multilingual large corpora. The results of the proposed model are based on performance metrics, F1-score, accuracy, precision, recall and MCC value.