Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Sadegh Safari

PSRB: A Comprehensive Benchmark for Evaluating Persian ASR Systems

May 27, 2025

Nima Sedghiyeh, Sara Sadeghi, Reza Khodadadi, Farzin Kashani, Omid Aghdaei, Somayeh Rahimi, Mohammad Sadegh Safari

Abstract:Although Automatic Speech Recognition (ASR) systems have become an integral part of modern technology, their evaluation remains challenging, particularly for low-resource languages such as Persian. This paper introduces Persian Speech Recognition Benchmark(PSRB), a comprehensive benchmark designed to address this gap by incorporating diverse linguistic and acoustic conditions. We evaluate ten ASR systems, including state-of-the-art commercial and open-source models, to examine performance variations and inherent biases. Additionally, we conduct an in-depth analysis of Persian ASR transcriptions, identifying key error types and proposing a novel metric that weights substitution errors. This metric enhances evaluation robustness by reducing the impact of minor and partial errors, thereby improving the precision of performance assessment. Our findings indicate that while ASR models generally perform well on standard Persian, they struggle with regional accents, children's speech, and specific linguistic challenges. These results highlight the necessity of fine-tuning and incorporating diverse, representative training datasets to mitigate biases and enhance overall ASR performance. PSRB provides a valuable resource for advancing ASR research in Persian and serves as a framework for developing benchmarks in other low-resource languages. A subset of the PSRB dataset is publicly available at https://huggingface.co/datasets/PartAI/PSRB.

* 25 pages, 7 figures

Via

Access Paper or Ask Questions

End-to-End Target Speaker Speech Recognition Using Context-Aware Attention Mechanisms for Challenging Enrollment Scenario

Jan 26, 2025

Mohsen Ghane, Mohammad Sadegh Safari

Abstract:This paper presents a novel streaming end-to-end target-speaker speech recognition that addresses two critical limitations in systems: the handling of noisy enrollment utterances and specific enrollment phrase requirements. This paper proposes a robust Target-Speaker Recurrent Neural Network Transducer (TS-RNNT) with dual attention mechanisms for contextual biasing and overlapping enrollment processing. The model incorporates a text decoder and attention mechanism specifically designed to extract relevant speaker characteristics from noisy, overlapping enrollment audio. Experimental results on a synthesized dataset demonstrate the model's resilience, maintaining a Word Error Rate (WER) of 16.44% even with overlapping enrollment at 5dB Signal-to-Interference Ratio (SIR), compared to conventional approaches that degrade to WERs above 75% under similar conditions. This significant performance improvement, coupled with the model's semi-text-dependent enrollment capabilities, represents a substantial advancement toward more practical and versatile voice-controlled devices.

Via

Access Paper or Ask Questions

Progressive Transmission using Recurrent Neural Networks

Aug 03, 2021

Mohammad Sadegh Safari, Vahid Pourahmadi, Patrick Mitran, Hamid Sheikhzadeh

Figure 1 for Progressive Transmission using Recurrent Neural Networks

Figure 2 for Progressive Transmission using Recurrent Neural Networks

Figure 3 for Progressive Transmission using Recurrent Neural Networks

Figure 4 for Progressive Transmission using Recurrent Neural Networks

Abstract:In this paper, we investigate a new machine learning-based transmission strategy called progressive transmission or ProgTr. In ProgTr, there are b variables that should be transmitted using at most T channel uses. The transmitter aims to send the data to the receiver as fast as possible and with as few channel uses as possible (as channel conditions permit) while the receiver refines its estimate after each channel use. We use recurrent neural networks as the building block of both the transmitter and receiver where the SNR is provided as an input that represents the channel conditions. To show how ProgTr works, the proposed scheme was simulated in different scenarios including single/multi-user settings, different channel conditions, and for both discrete and continuous input data. The results show that ProgTr can achieve better performance compared to conventional modulation methods. In addition to performance metrics such as BER, bit-wise mutual information is used to provide some interpretation to how the transmitter and receiver operate in ProgTr.

Via

Access Paper or Ask Questions

Deep UL2DL: Channel Knowledge Transfer from Uplink to Downlink

Dec 16, 2018

Mohammad Sadegh Safari, Vahid Pourahmadi

Figure 1 for Deep UL2DL: Channel Knowledge Transfer from Uplink to Downlink

Figure 2 for Deep UL2DL: Channel Knowledge Transfer from Uplink to Downlink

Figure 3 for Deep UL2DL: Channel Knowledge Transfer from Uplink to Downlink

Figure 4 for Deep UL2DL: Channel Knowledge Transfer from Uplink to Downlink

Abstract:Knowledge of the channel state information (CSI) at the transmitter side is one of the primary sources of information that can be used for efficient allocation of wireless resources. Obtaining Down-Link (DL) CSI in FDD systems from Up-Link (UL) CSI is not as straightforward as TDD systems, and so usually users feedback the DL-CSI to the transmitter. To remove the need for feedback (and thus having less signaling overhead), several methods have been studied to estimate DL-CSI from UL-CSI. In this paper, we propose a scheme to infer DL-CSI by observing UL-CSI in which we use two recent deep neural network structures: a) Convolutional Neural network and b) Generative Adversarial Networks. The proposed deep network structures are first learning a latent model of the environment from the training data. Then, the resulted latent model is used to predict the DL-CSI from the UL-CSI. We have simulated the proposed scheme and evaluated its performance in a few network settings.

* 24 pages, 15 Figures

Via

Access Paper or Ask Questions