Abstract:Effectively distinguishing the pronunciation correlations between different written texts is a significant issue in linguistic acoustics. Traditionally, such pronunciation correlations are obtained through manually designed pronunciation lexicons. In this paper, we propose a data-driven method to automatically acquire these pronunciation correlations, called automatic text pronunciation correlation (ATPC). The supervision required for this method is consistent with the supervision needed for training end-to-end automatic speech recognition (E2E-ASR) systems, i.e., speech and corresponding text annotations. First, the iteratively-trained timestamp estimator (ITSE) algorithm is employed to align the speech with their corresponding annotated text symbols. Then, a speech encoder is used to convert the speech into speech embeddings. Finally, we compare the speech embeddings distances of different text symbols to obtain ATPC. Experimental results on Mandarin show that ATPC enhances E2E-ASR performance in contextual biasing and holds promise for dialects or languages lacking artificial pronunciation lexicons.
Abstract:Federated learning (FL) is an emerging distributed machine learning paradigm that stands out with its inherent privacy-preserving advantages. Heterogeneity is one of the core challenges in FL, which resides in the diverse user behaviors and hardware capacity across devices who participate in the training. Heterogeneity inherently exerts a huge influence on the FL training process, e.g., causing device unavailability. However, existing FL literature usually ignores the impacts of heterogeneity. To fill in the knowledge gap, we build FLASH, the first heterogeneity-aware FL platform. Based on FLASH and a large-scale user trace from 136k real-world users, we demonstrate the usefulness of FLASH in anatomizing the impacts of heterogeneity in FL by exploring three previously unaddressed research questions: whether and how can heterogeneity affect FL performance; how to configure a heterogeneity-aware FL system; and what are heterogeneity's impacts on existing FL optimizations. It shows that heterogeneity causes nontrivial performance degradation in FL from various aspects, and even invalidates some typical FL optimizations.