Abstract:Multi-domain fake news detection aims to identify whether various news from different domains is real or fake and has become urgent and important. However, existing methods are dedicated to improving the overall performance of fake news detection, ignoring the fact that unbalanced data leads to disparate treatment for different domains, i.e., the domain bias problem. To solve this problem, we propose the Dual-Teacher De-biasing Distillation framework (DTDBD) to mitigate bias across different domains. Following the knowledge distillation methods, DTDBD adopts a teacher-student structure, where pre-trained large teachers instruct a student model. In particular, the DTDBD consists of an unbiased teacher and a clean teacher that jointly guide the student model in mitigating domain bias and maintaining performance. For the unbiased teacher, we introduce an adversarial de-biasing distillation loss to instruct the student model in learning unbiased domain knowledge. For the clean teacher, we design domain knowledge distillation loss, which effectively incentivizes the student model to focus on representing domain features while maintaining performance. Moreover, we present a momentum-based dynamic adjustment algorithm to trade off the effects of two teachers. Extensive experiments on Chinese and English datasets show that the proposed method substantially outperforms the state-of-the-art baseline methods in terms of bias metrics while guaranteeing competitive performance.
Abstract:Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem of compression of SRM and also propose a solution named ComSR for MDL-based compressing of sequential rules based on the designed sequential rule coding scheme. To our knowledge, we are the first to use sequential rules to encode an entire database. A heuristic method is proposed to find a set of compact and meaningful sequential rules as much as possible. ComSR has two trade-off algorithms, ComSR_non and ComSR_ful, based on whether the database can be completely compressed. Experiments done on a real dataset with different thresholds show that a set of compact and meaningful sequential rules can be found. This shows that the proposed method works.
Abstract:In many real-world applications, sequential rule mining (SRM) can provide prediction and recommendation functions for a variety of services. It is an important technique of pattern mining to discover all valuable rules that belong to high-frequency and high-confidence sequential rules. Although several algorithms of SRM are proposed to solve various practical problems, there are no studies on target sequential rules. Targeted sequential rule mining aims at mining the interesting sequential rules that users focus on, thus avoiding the generation of other invalid and unnecessary rules. This approach can further improve the efficiency of users in analyzing rules and reduce the consumption of data resources. In this paper, we provide the relevant definitions of target sequential rule and formulate the problem of targeted sequential rule mining. Furthermore, we propose an efficient algorithm, called targeted sequential rule mining (TaSRM). Several pruning strategies and an optimization are introduced to improve the efficiency of TaSRM. Finally, a large number of experiments are conducted on different benchmarks, and we analyze the results in terms of their running time, memory consumption, and scalability, as well as query cases with different query rules. It is shown that the novel algorithm TaSRM and its variants can achieve better experimental performance compared to the existing baseline algorithm.
Abstract:Case-based reasoning (CBR) based on description logics (DLs) has gained a lot of attention lately. Adaptation is a basic task in the CBR inference that can be modeled as the knowledge base revision problem and solved in propositional logic. However, in DLs, it is still a challenge problem since existing revision operators only work well for strictly restricted DLs of the \emph{DL-Lite} family, and it is difficult to design a revision algorithm which is syntax-independent and fine-grained. In this paper, we present a new method for adaptation based on the DL $\mathcal{EL_{\bot}}$. Following the idea of adaptation as revision, we firstly extend the logical basis for describing cases from propositional logic to the DL $\mathcal{EL_{\bot}}$, and present a formalism for adaptation based on $\mathcal{EL_{\bot}}$. Then we present an adaptation algorithm for this formalism and demonstrate that our algorithm is syntax-independent and fine-grained. Our work provides a logical basis for adaptation in CBR systems where cases and domain knowledge are described by the tractable DL $\mathcal{EL_{\bot}}$.