Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Y. Shen

Enhancing the De-identification of Personally Identifiable Information in Educational Data

Jan 14, 2025

Y. Shen, Z. Ji, J. Lin, K. R. Koedginer

Abstract:Protecting Personally Identifiable Information (PII), such as names, is a critical requirement in learning technologies to safeguard student and teacher privacy and maintain trust. Accurate PII detection is an essential step toward anonymizing sensitive information while preserving the utility of educational data. Motivated by recent advancements in artificial intelligence, our study investigates the GPT-4o-mini model as a cost-effective and efficient solution for PII detection tasks. We explore both prompting and fine-tuning approaches and compare GPT-4o-mini's performance against established frameworks, including Microsoft Presidio and Azure AI Language. Our evaluation on two public datasets, CRAPII and TSCC, demonstrates that the fine-tuned GPT-4o-mini model achieves superior performance, with a recall of 0.9589 on CRAPII. Additionally, fine-tuned GPT-4o-mini significantly improves precision scores (a threefold increase) while reducing computational costs to nearly one-tenth of those associated with Azure AI Language. Furthermore, our bias analysis reveals that the fine-tuned GPT-4o-mini model consistently delivers accurate results across diverse cultural backgrounds and genders. The generalizability analysis using the TSCC dataset further highlights its robustness, achieving a recall of 0.9895 with minimal additional training data from TSCC. These results emphasize the potential of fine-tuned GPT-4o-mini as an accurate and cost-effective tool for PII detection in educational data. It offers robust privacy protection while preserving the data's utility for research and pedagogical analysis. Our code is available on GitHub: https://github.com/AnonJD/PrivacyAI

* 14 pages, 1 figure; This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Feb 14, 2022

Y. Shen, R. Wang, W. Zuo, N. Zheng

Figure 1 for Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Figure 2 for Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Figure 3 for Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Figure 4 for Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition

Abstract:Visual place recognition (VPR) is a key issue for robotics and autonomous systems. For the trade-off between time and performance, most of methods use the coarse-to-fine hierarchical architecture, which consists of retrieving top-N candidates using global features, and re-ranking top-N with local features. However, since the two types of features are usually processed independently, re-ranking may harm global retrieval, termed re-ranking confusion. Moreover, re-ranking is limited by global retrieval. In this paper, we propose a tightly coupled learning (TCL) strategy to train triplet models. Different from original triplet learning (OTL) strategy, it combines global and local descriptors for joint optimization. In addition, a bidirectional search dynamic time warping (BS-DTW) algorithm is also proposed to mine locally spatial information tailored to VPR in re-ranking. The experimental results on public benchmarks show that the models using TCL outperform the models using OTL, and TCL can be used as a general strategy to improve performance for weakly supervised ranking tasks. Further, our lightweight unified model is better than several state-of-the-art methods and has over an order of magnitude of computational efficiency to meet the real-time requirements of robots.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Dec 05, 2018

K. Lei, S. Si, D. Wen, Y. Shen

Figure 1 for An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Figure 2 for An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Figure 3 for An enhanced computational feature selection method for medical synonym identification via bilingualism and multi-corpus training

Abstract:Medical synonym identification has been an important part of medical natural language processing (NLP). However, in the field of Chinese medical synonym identification, there are problems like low precision and low recall rate. To solve the problem, in this paper, we propose a method for identifying Chinese medical synonyms. We first selected 13 features including Chinese and English features. Then we studied the synonym identification results of each feature alone and different combinations of the features. Through the comparison among identification results, we present an optimal combination of features for Chinese medical synonym identification. Experiments show that our selected features have achieved 97.37% precision rate, 96.00% recall rate and 97.33% F1 score.

Via

Access Paper or Ask Questions