Abstract:ChatGPT has demonstrated impressive performance in various downstream tasks. However, in the Chinese Spelling Correction (CSC) task, we observe a discrepancy: while ChatGPT performs well under human evaluation, it scores poorly according to traditional metrics. We believe this inconsistency arises because the traditional metrics are not well-suited for evaluating generative models. Their overly strict length and phonics constraints may lead to underestimating ChatGPT's correction capabilities. To better evaluate generative models in the CSC task, this paper proposes a new evaluation metric: Eval-GCSC. By incorporating word-level and semantic similarity judgments, it relaxes the stringent length and phonics constraints. Experimental results show that Eval-GCSC closely aligns with human evaluations. Under this metric, ChatGPT's performance is comparable to traditional token-level classification models (TCM), demonstrating its potential as a CSC tool. The source code and scripts can be accessed at https://github.com/ktlKTL/Eval-GCSC.
Abstract:The deep learning models used for speaker verification are heavily dependent on large-scale data and correct labels. However, noisy (wrong) labels often occur, which deteriorates the system's performance. Unfortunately, there are relatively few studies in this area. In this paper, we propose a method to gradually filter noisy labels out at the training stage. We compare the network predictions at different training epochs with ground-truth labels, and select reliable (considered correct) labels by using the OR gate mechanism like that in logic circuits. Therefore, our proposed method is named as OR-Gate. We experimentally demonstrated that the OR-Gate can effectively filter noisy labels out and has excellent performance.