Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianbin Li

CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction

Jun 16, 2025

Ting Qiao, Yiming Li, Jianbin Li, Yingjia Wang, Leyi Qi, Junfeng Guo, Ruili Feng, Dacheng Tao

Abstract:Deep neural networks (DNNs) rely heavily on high-quality open-source datasets (e.g., ImageNet) for their success, making dataset ownership verification (DOV) crucial for protecting public dataset copyrights. In this paper, we find existing DOV methods (implicitly) assume that the verification process is faithful, where the suspicious model will directly verify ownership by using the verification samples as input and returning their results. However, this assumption may not necessarily hold in practice and their performance may degrade sharply when subjected to intentional or unintentional perturbations. To address this limitation, we propose the first certified dataset watermark (i.e., CertDW) and CertDW-based certified dataset ownership verification method that ensures reliable verification even under malicious attacks, under certain conditions (e.g., constrained pixel-level perturbation). Specifically, inspired by conformal prediction, we introduce two statistical measures, including principal probability (PP) and watermark robustness (WR), to assess model prediction stability on benign and watermarked samples under noise perturbations. We prove there exists a provable lower bound between PP and WR, enabling ownership verification when a suspicious model's WR value significantly exceeds the PP values of multiple benign models trained on watermark-free datasets. If the number of PP values smaller than WR exceeds a threshold, the suspicious model is regarded as having been trained on the protected dataset. Extensive experiments on benchmark datasets verify the effectiveness of our CertDW method and its resistance to potential adaptive attacks. Our codes are at \href{https://github.com/NcepuQiaoTing/CertDW}{GitHub}.

* The first two authors contributed equally to this work. 16 pages

Via

Access Paper or Ask Questions

SE-GCL: An Event-Based Simple and Effective Graph Contrastive Learning for Text Representation

Dec 16, 2024

Tao Meng, Wei Ai, Jianbin Li, Ze Wang, Yuntao Shou, Keqin Li

Figure 1 for SE-GCL: An Event-Based Simple and Effective Graph Contrastive Learning for Text Representation

Figure 2 for SE-GCL: An Event-Based Simple and Effective Graph Contrastive Learning for Text Representation

Figure 3 for SE-GCL: An Event-Based Simple and Effective Graph Contrastive Learning for Text Representation

Figure 4 for SE-GCL: An Event-Based Simple and Effective Graph Contrastive Learning for Text Representation

Abstract:Text representation learning is significant as the cornerstone of natural language processing. In recent years, graph contrastive learning (GCL) has been widely used in text representation learning due to its ability to represent and capture complex text information in a self-supervised setting. However, current mainstream graph contrastive learning methods often require the incorporation of domain knowledge or cumbersome computations to guide the data augmentation process, which significantly limits the application efficiency and scope of GCL. Additionally, many methods learn text representations only by constructing word-document relationships, which overlooks the rich contextual semantic information in the text. To address these issues and exploit representative textual semantics, we present an event-based, simple, and effective graph contrastive learning (SE-GCL) for text representation. Precisely, we extract event blocks from text and construct internal relation graphs to represent inter-semantic interconnections, which can ensure that the most critical semantic information is preserved. Then, we devise a streamlined, unsupervised graph contrastive learning framework to leverage the complementary nature of the event semantic and structural information for intricate feature data capture. In particular, we introduce the concept of an event skeleton for core representation semantics and simplify the typically complex data augmentation techniques found in existing graph contrastive learning to boost algorithmic efficiency. We employ multiple loss functions to prompt diverse embeddings to converge or diverge within a confined distance in the vector space, ultimately achieving a harmonious equilibrium. We conducted experiments on the proposed SE-GCL on four standard data sets (AG News, 20NG, SougouNews, and THUCNews) to verify its effectiveness in text representation learning.

* 19 pages, 6 tables

Via

Access Paper or Ask Questions

Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

Nov 25, 2024

Wei Ai, Jianbin Li, Ze Wang, Yingying Wei, Tao Meng, Yuntao Shou, Keqin Lib

Figure 1 for Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

Figure 2 for Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

Figure 3 for Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

Figure 4 for Contrastive Multi-graph Learning with Neighbor Hierarchical Sifting for Semi-supervised Text Classification

Abstract:Graph contrastive learning has been successfully applied in text classification due to its remarkable ability for self-supervised node representation learning. However, explicit graph augmentations may lead to a loss of semantics in the contrastive views. Secondly, existing methods tend to overlook edge features and the varying significance of node features during multi-graph learning. Moreover, the contrastive loss suffer from false negatives. To address these limitations, we propose a novel method of contrastive multi-graph learning with neighbor hierarchical sifting for semi-supervised text classification, namely ConNHS. Specifically, we exploit core features to form a multi-relational text graph, enhancing semantic connections among texts. By separating text graphs, we provide diverse views for contrastive learning. Our approach ensures optimal preservation of the graph information, minimizing data loss and distortion. Then, we separately execute relation-aware propagation and cross-graph attention propagation, which effectively leverages the varying correlations between nodes and edge features while harmonising the information fusion across graphs. Subsequently, we present the neighbor hierarchical sifting loss (NHS) to refine the negative selection. For one thing, following the homophily assumption, NHS masks first-order neighbors of the anchor and positives from being negatives. For another, NHS excludes the high-order neighbors analogous to the anchor based on their similarities. Consequently, it effectively reduces the occurrence of false negatives, preventing the expansion of the distance between similar samples in the embedding space. Our experiments on ThuCNews, SogouNews, 20 Newsgroups, and Ohsumed datasets achieved 95.86\%, 97.52\%, 87.43\%, and 70.65\%, which demonstrates competitive results in semi-supervised text classification.

* 16 pages, 6 figures

Via

Access Paper or Ask Questions

SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment

Oct 28, 2024

Wei Ai, Yinghui Gao, Jianbin Li, Jiayi Du, Tao Meng, Yuntao Shou, Keqin Li

Figure 1 for SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment

Figure 2 for SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment

Figure 3 for SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment

Figure 4 for SEG:Seeds-Enhanced Iterative Refinement Graph Neural Network for Entity Alignment

Abstract:Entity alignment is crucial for merging knowledge across knowledge graphs, as it matches entities with identical semantics. The standard method matches these entities based on their embedding similarities using semi-supervised learning. However, diverse data sources lead to non-isomorphic neighborhood structures for aligned entities, complicating alignment, especially for less common and sparsely connected entities. This paper presents a soft label propagation framework that integrates multi-source data and iterative seed enhancement, addressing scalability challenges in handling extensive datasets where scale computing excels. The framework uses seeds for anchoring and selects optimal relationship pairs to create soft labels rich in neighborhood features and semantic relationship data. A bidirectional weighted joint loss function is implemented, which reduces the distance between positive samples and differentially processes negative samples, taking into account the non-isomorphic neighborhood structures. Our method outperforms existing semi-supervised approaches, as evidenced by superior results on multiple datasets, significantly improving the quality of entity alignment.

* 7, 2 figures

Via

Access Paper or Ask Questions

Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Oct 18, 2024

Wei Ai, Jianbin Li, Ze Wang, Jiayi Du, Tao Meng, Yuntao Shou, Keqin Li

Figure 1 for Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Figure 2 for Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Figure 3 for Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Figure 4 for Graph Contrastive Learning via Cluster-refined Negative Sampling for Semi-supervised Text Classification

Abstract:Graph contrastive learning (GCL) has been widely applied to text classification tasks due to its ability to generate self-supervised signals from unlabeled data, thus facilitating model training. However, existing GCL-based text classification methods often suffer from negative sampling bias, where similar nodes are incorrectly paired as negative pairs. This can lead to over-clustering, where instances of the same class are divided into different clusters. To address the over-clustering issue, we propose an innovative GCL-based method of graph contrastive learning via cluster-refined negative sampling for semi-supervised text classification, namely ClusterText. Firstly, we combine the pre-trained model Bert with graph neural networks to learn text representations. Secondly, we introduce a clustering refinement strategy, which clusters the learned text representations to obtain pseudo labels. For each text node, its negative sample set is drawn from different clusters. Additionally, we propose a self-correction mechanism to mitigate the loss of true negative samples caused by clustering inconsistency. By calculating the Euclidean distance between each text node and other nodes within the same cluster, distant nodes are still selected as negative samples. Our proposed ClusterText demonstrates good scalable computing, as it can effectively extract important information from from a large amount of data. Experimental results demonstrate the superiority of ClusterText in text classification tasks.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions