Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhongwei Teng

NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Feb 15, 2023

Quchen Fu, Zhongwei Teng, Marco Georgaklis, Jules White, Douglas C. Schmidt

Figure 1 for NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Figure 2 for NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Figure 3 for NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Figure 4 for NL2CMD: An Updated Workflow for Natural Language to Bash Commands Translation

Abstract:Translating natural language into Bash Commands is an emerging research field that has gained attention in recent years. Most efforts have focused on producing more accurate translation models. To the best of our knowledge, only two datasets are available, with one based on the other. Both datasets involve scraping through known data sources (through platforms like stack overflow, crowdsourcing, etc.) and hiring experts to validate and correct either the English text or Bash Commands. This paper provides two contributions to research on synthesizing Bash Commands from scratch. First, we describe a state-of-the-art translation model used to generate Bash Commands from the corresponding English text. Second, we introduce a new NL2CMD dataset that is automatically generated, involves minimal human intervention, and is over six times larger than prior datasets. Since the generation pipeline does not rely on existing Bash Commands, the distribution and types of commands can be custom adjusted. Our empirical results show how the scale and diversity of our dataset can offer unique opportunities for semantic parsing researchers.

* Journal of Machine Learning Theory, Applications and Practice 2022

Via

Access Paper or Ask Questions

Deep Learning Models on CPUs: A Methodology for Efficient Training

Jun 20, 2022

Quchen Fu, Ramesh Chukka, Keith Achorn, Thomas Atta-fosu, Deepak R. Canchi, Zhongwei Teng, Jules White, Douglas C. Schmidt

Figure 1 for Deep Learning Models on CPUs: A Methodology for Efficient Training

Figure 2 for Deep Learning Models on CPUs: A Methodology for Efficient Training

Figure 3 for Deep Learning Models on CPUs: A Methodology for Efficient Training

Figure 4 for Deep Learning Models on CPUs: A Methodology for Efficient Training

Abstract:GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when deciding on how to choose the proper hardware for training. In particular, CPU servers can be beneficial if training on CPUs was more efficient, as they incur fewer hardware update costs and better utilizing existing infrastructure. This paper makes several contributions to research on training deep learning models using CPUs. First, it presents a method for optimizing the training of deep learning models on Intel CPUs and a toolkit called ProfileDNN, which we developed to improve performance profiling. Second, we describe a generic training optimization method that guides our workflow and explores several case studies where we identified performance issues and then optimized the Intel Extension for PyTorch, resulting in an overall 2x training performance increase for the RetinaNet-ResNext50 model. Third, we show how to leverage the visualization capabilities of ProfileDNN, which enabled us to pinpoint bottlenecks and create a custom focal loss kernel that was two times faster than the official reference PyTorch implementation.

Via

Access Paper or Ask Questions

SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Mar 24, 2022

Zhongwei Teng, Quchen Fu, Jules White, Maria E. Powell, Douglas C. Schmidt

Figure 1 for SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Figure 2 for SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Figure 3 for SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Figure 4 for SA-SASV: An End-to-End Spoof-Aggregated Spoofing-Aware Speaker Verification System

Abstract:Research in the past several years has boosted the performance of automatic speaker verification systems and countermeasure systems to deliver low Equal Error Rates (EERs) on each system. However, research on joint optimization of both systems is still limited. The Spoofing-Aware Speaker Verification (SASV) 2022 challenge was proposed to encourage the development of integrated SASV systems with new metrics to evaluate joint model performance. This paper proposes an ensemble-free end-to-end solution, known as Spoof-Aggregated-SASV (SA-SASV) to build a SASV system with multi-task classifiers, which are optimized by multiple losses and has more flexible requirements in training set. The proposed system is trained on the ASVSpoof 2019 LA dataset, a spoof verification dataset with small number of bonafide speakers. Results of SASV-EER indicate that the model performance can be further improved by training in complete automatic speaker verification and countermeasure datasets.

* Update Experiment Results in ASV2019 protocol

Via

Access Paper or Ask Questions

FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Sep 06, 2021

Quchen Fu, Zhongwei Teng, Jules White, Maria Powell, Douglas C. Schmidt

Figure 1 for FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Figure 2 for FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Figure 3 for FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Figure 4 for FastAudio: A Learnable Audio Front-End for Spoof Speech Detection

Abstract:Voice assistants, such as smart speakers, have exploded in popularity. It is currently estimated that the smart speaker adoption rate has exceeded 35% in the US adult population. Manufacturers have integrated speaker identification technology, which attempts to determine the identity of the person speaking, to provide personalized services to different members of the same family. Speaker identification can also play an important role in controlling how the smart speaker is used. For example, it is not critical to correctly identify the user when playing music. However, when reading the user's email out loud, it is critical to correctly verify the speaker that making the request is the authorized user. Speaker verification systems, which authenticate the speaker identity, are therefore needed as a gatekeeper to protect against various spoofing attacks that aim to impersonate the enrolled user. This paper compares popular learnable front-ends which learn the representations of audio by joint training with downstream tasks (End-to-End). We categorize the front-ends by defining two generic architectures and then analyze the filtering stages of both types in terms of learning constraints. We propose replacing fixed filterbanks with a learnable layer that can better adapt to anti-spoofing tasks. The proposed FastAudio front-end is then tested with two popular back-ends to measure the performance on the LA track of the ASVspoof 2019 dataset. The FastAudio front-end achieves a relative improvement of 27% when compared with fixed front-ends, outperforming all other learnable front-ends on this task.

Via

Access Paper or Ask Questions

Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

Sep 06, 2021

Zhongwei Teng, Quchen Fu, Jules White, Maria Powell, Douglas C. Schmidt

Figure 1 for Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

Figure 2 for Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

Figure 3 for Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

Figure 4 for Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model

Abstract:An emerging trend in audio processing is capturing low-level speech representations from raw waveforms. These representations have shown promising results on a variety of tasks, such as speech recognition and speech separation. Compared to handcrafted features, learning speech features via backpropagation provides the model greater flexibility in how it represents data for different tasks theoretically. However, results from empirical study shows that, in some tasks, such as voice spoof detection, handcrafted features are more competitive than learned features. Instead of evaluating handcrafted features and raw waveforms independently, this paper proposes an Auxiliary Rawnet model to complement handcrafted features with features learned from raw waveforms. A key benefit of the approach is that it can improve accuracy at a relatively low computational cost. The proposed Auxiliary Rawnet model is tested using the ASVspoof 2019 dataset and the results from this dataset indicate that a light-weight waveform encoder can potentially boost the performance of handcrafted-features-based encoders in exchange for a small amount of additional computational work.

Via

Access Paper or Ask Questions

NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands

Mar 03, 2021

Mayank Agarwal, Tathagata Chakraborti, Quchen Fu, David Gros, Xi Victoria Lin, Jaron Maene, Kartik Talamadupula, Zhongwei Teng, Jules White

Figure 1 for NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands

Figure 2 for NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands

Figure 3 for NeurIPS 2020 NLC2CMD Competition: Translating Natural Language to Bash Commands

Abstract:The NLC2CMD Competition hosted at NeurIPS 2020 aimed to bring the power of natural language processing to the command line. Participants were tasked with building models that can transform descriptions of command line tasks in English to their Bash syntax. This is a report on the competition with details of the task, metrics, data, attempted solutions, and lessons learned.

* Competition URL: http://ibm.biz/nlc2cmd

Via

Access Paper or Ask Questions