Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Earballs: Neural Transmodal Translation

Jun 05, 2020

Andrew Port, Chelhwon Kim, Mitesh Patel

Figure 1 for Earballs: Neural Transmodal Translation

Figure 2 for Earballs: Neural Transmodal Translation

Figure 3 for Earballs: Neural Transmodal Translation

Figure 4 for Earballs: Neural Transmodal Translation

Share this with someone who'll enjoy it:

Abstract:As is expressed in the adage "a picture is worth a thousand words", when using spoken language to communicate visual information, brevity can be a challenge. This work describes a novel technique for leveraging machine learned feature embeddings to translate visual (and other types of) information into a perceptual audio domain, allowing users to perceive this information using only their aural faculty. The system uses a pretrained image embedding network to extract visual features and embed them in a compact subset of Euclidean space -- this converts the images into feature vectors whose $L^2$ distances can be used as a meaningful measure of similarity. A generative adversarial network (GAN) is then used to find a distance preserving map from this metric space of feature vectors into the metric space defined by a target audio dataset equipped with either the Euclidean metric or a mel-frequency cepstrum-based psychoacoustic distance metric. We demonstrate this technique by translating images of faces into human speech-like audio. For both target audio metrics, the GAN successfully found a metric preserving mapping, and in human subject tests, users were able to accurately classify audio translations of faces.

* 9 pages, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:Earballs: Neural Transmodal Translation

Paper and Code