Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gilad Avidov

Flashlight: Enabling Innovation in Tools for Machine Learning

Jan 29, 2022

Jacob Kahn, Vineel Pratap, Tatiana Likhomanenko, Qiantong Xu, Awni Hannun, Jeff Cai, Paden Tomasello, Ann Lee, Edouard Grave, Gilad Avidov(+4 more)

Figure 1 for Flashlight: Enabling Innovation in Tools for Machine Learning

Figure 2 for Flashlight: Enabling Innovation in Tools for Machine Learning

Figure 3 for Flashlight: Enabling Innovation in Tools for Machine Learning

Figure 4 for Flashlight: Enabling Innovation in Tools for Machine Learning

Abstract:As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging. While computational needs have driven recent compiler, networking, and hardware advancements, utilization of those advancements by machine learning tools is occurring at a slower pace. This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks. Large frameworks prioritize machine learning researchers and practitioners as end users and pay comparatively little attention to systems researchers who can push frameworks forward -- we argue that both are equally important stakeholders. We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems by prioritizing open, modular, customizable internals and state-of-the-art, research-ready models and training setups across a variety of domains. Flashlight allows systems researchers to rapidly prototype and experiment with novel ideas in machine learning computation and has low overhead, competing with and often outperforming other popular machine learning frameworks. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.

Via

Access Paper or Ask Questions

Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Oct 22, 2020

Tatiana Likhomanenko, Qiantong Xu, Vineel Pratap, Paden Tomasello, Jacob Kahn, Gilad Avidov, Ronan Collobert, Gabriel Synnaeve

Figure 1 for Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Figure 2 for Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Figure 3 for Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Figure 4 for Rethinking Evaluation in ASR: Are Our Models Robust Enough?

Abstract:Is pushing numbers on a single benchmark valuable in automatic speech recognition? Research results in acoustic modeling are typically evaluated based on performance on a single dataset. While the research community has coalesced around various benchmarks, we set out to understand generalization performance in acoustic modeling across datasets -- in particular, if models trained on a single dataset transfer to other (possibly out-of-domain) datasets. Further, we demonstrate that when a large enough set of benchmarks is used, average word error rate (WER) performance over them provides a good proxy for performance on real-world data. Finally, we show that training a single acoustic model on the most widely-used datasets -- combined -- reaches competitive performance on both research and real-world benchmarks.

Via

Access Paper or Ask Questions

Scaling Up Online Speech Recognition Using ConvNets

Jan 27, 2020

Vineel Pratap, Qiantong Xu, Jacob Kahn, Gilad Avidov, Tatiana Likhomanenko, Awni Hannun, Vitaliy Liptchinsky, Gabriel Synnaeve, Ronan Collobert

Figure 1 for Scaling Up Online Speech Recognition Using ConvNets

Figure 2 for Scaling Up Online Speech Recognition Using ConvNets

Figure 3 for Scaling Up Online Speech Recognition Using ConvNets

Figure 4 for Scaling Up Online Speech Recognition Using ConvNets

Abstract:We design an online end-to-end speech recognition system based on Time-Depth Separable (TDS) convolutions and Connectionist Temporal Classification (CTC). We improve the core TDS architecture in order to limit the future context and hence reduce latency while maintaining accuracy. The system has almost three times the throughput of a well tuned hybrid ASR baseline while also having lower latency and a better word error rate. Also important to the efficiency of the recognizer is our highly optimized beam search decoder. To show the impact of our design choices, we analyze throughput, latency, accuracy, and discuss how these metrics can be tuned based on the user requirements.

Via

Access Paper or Ask Questions