Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxuan Lan

Finding phonemes: improving machine lip-reading

Oct 03, 2017

Helen L. Bear, Richard W. Harvey, Yuxuan Lan

Figure 1 for Finding phonemes: improving machine lip-reading

Figure 2 for Finding phonemes: improving machine lip-reading

Figure 3 for Finding phonemes: improving machine lip-reading

Figure 4 for Finding phonemes: improving machine lip-reading

Abstract:In machine lip-reading there is continued debate and research around the correct classes to be used for recognition. In this paper we use a structured approach for devising speaker-dependent viseme classes, which enables the creation of a set of phoneme-to-viseme maps where each has a different quantity of visemes ranging from two to 45. Viseme classes are based upon the mapping of articulated phonemes, which have been confused during phoneme recognition, into viseme groups. Using these maps, with the LiLIR dataset, we show the effect of changing the viseme map size in speaker-dependent machine lip-reading, measured by word recognition correctness and so demonstrate that word recognition with phoneme classifiers is not just possible, but often better than word recognition with viseme classifiers. Furthermore, there are intermediate units between visemes and phonemes which are better still.

* Helen L. Bear, Richard W. Harvey, Yuxuan Lan. Finding phonemes: improving machine lip-reading. Audio-Visual Speech Processing (AVSP), 2015 p115-120

Via

Access Paper or Ask Questions

Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Oct 03, 2017

Helen L. Bear, Richard W. Harvey, Barry-John Theobald, Yuxuan Lan

Figure 1 for Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Figure 2 for Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Figure 3 for Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Figure 4 for Which phoneme-to-viseme maps best improve visual-only computer lip-reading?

Abstract:A critical assumption of all current visual speech recognition systems is that there are visual speech units called visemes which can be mapped to units of acoustic speech, the phonemes. Despite there being a number of published maps it is infrequent to see the effectiveness of these tested, particularly on visual-only lip-reading (many works use audio-visual speech). Here we examine 120 mappings and consider if any are stable across talkers. We show a method for devising maps based on phoneme confusions from an automated lip-reading system, and we present new mappings that show improvements for individual talkers.

* Helen L. Bear, Richard W. Harvey, Barry-John Theobald, and Yuxuan Lan. Which phoneme-to-viseme maps best improve visual-only computer lip-reading? Advances in Visual Computing 2014. p230-239

Via

Access Paper or Ask Questions

Resolution limits on visual speech recognition

Oct 03, 2017

Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan

Figure 1 for Resolution limits on visual speech recognition

Figure 2 for Resolution limits on visual speech recognition

Figure 3 for Resolution limits on visual speech recognition

Figure 4 for Resolution limits on visual speech recognition

Abstract:Visual-only speech recognition is dependent upon a number of factors that can be difficult to control, such as: lighting; identity; motion; emotion and expression. But some factors, such as video resolution are controllable, so it is surprising that there is not yet a systematic study of the effect of resolution on lip-reading. Here we use a new data set, the Rosetta Raven data, to train and test recognizers so we can measure the affect of video resolution on recognition accuracy. We conclude that, contrary to common practice, resolution need not be that great for automatic lip-reading. However it is highly unlikely that automatic lip-reading can work reliably when the distance between the bottom of the lower lip and the top of the upper lip is less than four pixels at rest.

* Helen L. Bear, Richard Harvey, Barry-John Theobald, Yuxuan Lan. Resolution limits on visual speech recognition. International Conference on Image Processing (ICIP). 2014. p1371-1375

Via

Access Paper or Ask Questions