Abstract:Rich-text captions are essential to help communication for Deaf and hard-of-hearing (DHH) people, second-language learners, and those with autism spectrum disorder (ASD). They also preserve nuances when converting speech to text, enhancing the realism of presentation scripts and conversation or speech logs. However, current real-time captioning systems lack the capability to alter text attributes (ex. capitalization, sizes, and fonts) at the word level, hindering the accurate conveyance of speaker intent that is expressed in the tones or intonations of the speech. For example, ''YOU should do this'' tends to be considered as indicating ''You'' as the focus of the sentence, whereas ''You should do THIS'' tends to be ''This'' as the focus. This paper proposes a solution that changes the text decorations at the word level in real time. As a prototype, we developed an application that adjusts word size based on the loudness of each spoken word. Feedback from users implies that this system helped to convey the speaker's intent, offering a more engaging and accessible captioning experience.
Abstract:Whispering is a common privacy-preserving technique in voice-based interactions, but its effectiveness is limited in noisy environments. In conventional hardware- and software-based noise reduction approaches, isolating whispered speech from ambient noise and other speech sounds remains a challenge. We thus propose WhisperMask, a mask-type microphone featuring a large diaphragm with low sensitivity, making the wearer's voice significantly louder than the background noise. We evaluated WhisperMask using three key metrics: signal-to-noise ratio, quality of recorded voices, and speech recognition rate. Across all metrics, WhisperMask consistently outperformed traditional noise-suppressing microphones and software-based solutions. Notably, WhisperMask showed a 30% higher recognition accuracy for whispered speech recorded in an environment with 80 dB background noise compared with the pin microphone and earbuds. Furthermore, while a denoiser decreased the whispered speech recognition rate of these two microphones by approximately 20% at 30-60 dB noise, WhisperMask maintained a high performance even without denoising, surpassing the other microphones' performances by a significant margin.WhisperMask's design renders the wearer's voice as the dominant input and effectively suppresses background noise without relying on signal processing. This device allows for reliable voice interactions, such as phone calls and voice commands, in a wide range of noisy real-world scenarios while preserving user privacy.