Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kinneret Misgav

ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Jul 17, 2023

Yanir Marmor, Kinneret Misgav, Yair Lifshitz

Figure 1 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Figure 2 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Figure 3 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Figure 4 for ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development

Abstract:We introduce "ivrit.ai", a comprehensive Hebrew speech dataset, addressing the distinct lack of extensive, high-quality resources for advancing Automated Speech Recognition (ASR) technology in Hebrew. With over 3,300 speech hours and a over a thousand diverse speakers, ivrit.ai offers a substantial compilation of Hebrew speech across various contexts. It is delivered in three forms to cater to varying research needs: raw unprocessed audio; data post-Voice Activity Detection, and partially transcribed data. The dataset stands out for its legal accessibility, permitting use at no cost, thereby serving as a crucial resource for researchers, developers, and commercial entities. ivrit.ai opens up numerous applications, offering vast potential to enhance AI capabilities in Hebrew. Future efforts aim to expand ivrit.ai further, thereby advancing Hebrew's standing in AI research and technology.

* 9 pages, 1 table and 3 figures

Via

Access Paper or Ask Questions