Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Wong

Explainable DNN-based Beamformer with Postfilter

Nov 16, 2024

Adi Cohen, Daniel Wong, Jung-Suk Lee, Sharon Gannot

Abstract:This paper introduces an explainable DNN-based beamformer with a postfilter (ExNet-BF+PF) for multichannel signal processing. Our approach combines the U-Net network with a beamformer structure to address this problem. The method involves a two-stage processing pipeline. In the first stage, time-invariant weights are applied to construct a multichannel spatial filter, namely a beamformer. In the second stage, a time-varying single-channel post-filter is applied at the beamformer output. Additionally, we incorporate an attention mechanism inspired by its successful application in noisy and reverberant environments to improve speech enhancement further. Furthermore, our study fills a gap in the existing literature by conducting a thorough spatial analysis of the network's performance. Specifically, we examine how the network utilizes spatial information during processing. This analysis yields valuable insights into the network's functionality, thereby enhancing our understanding of its overall performance. Experimental results demonstrate that our approach is not only straightforward to train but also yields superior results, obviating the necessity for prior knowledge of the speaker's activity.

Via

Access Paper or Ask Questions

All Neural Low-latency Directional Speech Extraction

Jul 05, 2024

Ashutosh Pandey, Sanha Lee, Juan Azcarreta, Daniel Wong, Buye Xu

Figure 1 for All Neural Low-latency Directional Speech Extraction

Figure 2 for All Neural Low-latency Directional Speech Extraction

Figure 3 for All Neural Low-latency Directional Speech Extraction

Figure 4 for All Neural Low-latency Directional Speech Extraction

Abstract:We introduce a novel all neural model for low-latency directional speech extraction. The model uses direction of arrival (DOA) embeddings from a predefined spatial grid, which are transformed and fused into a recurrent neural network based speech extraction model. This process enables the model to effectively extract speech from a specified DOA. Unlike previous methods that relied on hand-crafted directional features, the proposed model trains DOA embeddings from scratch using speech enhancement loss, making it suitable for low-latency scenarios. Additionally, it operates at a high frame rate, taking in DOA with each input frame, which brings in the capability of quickly adapting to changing scene in highly dynamic real-world scenarios. We provide extensive evaluation to demonstrate the model's efficacy in directional speech extraction, robustness to DOA mismatch, and its capability to quickly adapt to abrupt changes in DOA.

* Accepted for publication at INTERSPEECH 2024

Via

Access Paper or Ask Questions

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Jan 15, 2024

Tsun-An Hsieh, Jacob Donley, Daniel Wong, Buye Xu, Ashutosh Pandey

Figure 1 for On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Figure 2 for On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Figure 3 for On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Figure 4 for On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

Abstract:We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural Wiener filter (NWF). The first DNN enhances the speech signal to estimate NWF coefficients, while the second DNN refines the output from the NWF. The NWF, while conceptually similar to the traditional frequency-domain Wiener filter, undergoes a training process optimized for low-latency speech enhancement, involving fine-tuning of both analysis and synthesis transforms. Our research results illustrate that the NWF output, having minimal nonlinear distortions, attains performance levels akin to those of the first DNN, deviating from conventional Wiener filter paradigms. Training all components jointly outperforms sequential training, despite its simplicity. Consequently, this framework achieves superior performance with fewer parameters and reduced computational demands, making it a compelling solution for resource-efficient multichannel speech enhancement.

* Accepted for publication at ICASSP

Via

Access Paper or Ask Questions

Rethinking complex-valued deep neural networks for monaural speech enhancement

Jan 11, 2023

Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

Figure 1 for Rethinking complex-valued deep neural networks for monaural speech enhancement

Figure 2 for Rethinking complex-valued deep neural networks for monaural speech enhancement

Figure 3 for Rethinking complex-valued deep neural networks for monaural speech enhancement

Figure 4 for Rethinking complex-valued deep neural networks for monaural speech enhancement

Abstract:Despite multiple efforts made towards adopting complex-valued deep neural networks (DNNs), it remains an open question whether complex-valued DNNs are generally more effective than real-valued DNNs for monaural speech enhancement. This work is devoted to presenting a critical assessment by systematically examining complex-valued DNNs against their real-valued counterparts. Specifically, we investigate complex-valued DNN atomic units, including linear layers, convolutional layers, long short-term memory (LSTM), and gated linear units. By comparing complex- and real-valued versions of fundamental building blocks in the recently developed gated convolutional recurrent network (GCRN), we show how different mechanisms for basic blocks affect the performance. We also find that the use of complex-valued operations hinders the model capacity when the model size is small. In addition, we examine two recent complex-valued DNNs, i.e. deep complex convolutional recurrent network (DCCRN) and deep complex U-Net (DCUNET). Evaluation results show that both DNNs produce identical performance to their real-valued counterparts while requiring much more computation. Based on these comprehensive comparisons, we conclude that complex-valued DNNs do not provide a performance gain over their real-valued counterparts for monaural speech enhancement, and thus are less desirable due to their higher computational costs.

Via

Access Paper or Ask Questions

NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Dec 08, 2021

Jonah Casebeer, Jacob Donley, Daniel Wong, Buye Xu, Anurag Kumar

Figure 1 for NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Figure 2 for NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Figure 3 for NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Figure 4 for NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

Abstract:Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this paper, we propose Neural Integrated Covariance Estimators for Beamformers, NICE-Beam. NICE-Beam is a general technique for learning how to estimate time-varying spatial covariance matrices, which we apply to joint speech enhancement and dereverberation. It is based on training a neural network module to non-linearly track and leverage scene information across time. We integrate our solution into a beamforming pipeline, which enables simple training, faster than real-time inference, and a variety of test-time adaptation options. We evaluate the proposed model against a suite of baselines in scenes with both stationary and moving microphones. Our results show that the proposed method can outperform a hand-tuned estimator, despite the hand-tuned estimator using oracle source separation knowledge.

Via

Access Paper or Ask Questions

Information Decoding and SDR Implementation of DFRC Systems Without Training Signals

Feb 21, 2021

Daniel Wong, Batu K. Chalise, Justin Metcalf, Moeness Amin

Figure 1 for Information Decoding and SDR Implementation of DFRC Systems Without Training Signals

Figure 2 for Information Decoding and SDR Implementation of DFRC Systems Without Training Signals

Figure 3 for Information Decoding and SDR Implementation of DFRC Systems Without Training Signals

Figure 4 for Information Decoding and SDR Implementation of DFRC Systems Without Training Signals

Abstract:Recent performance analysis of dual-function radar communications (DFRC) systems, which embed information using phase shift keying (PSK) into multiple-input multiple-output (MIMO) frequency hopping (FH) radar pulses, shows promising results for addressing spectrum sharing issues between radar and communications. However, the problem of decoding information at the communication receiver remains challenging, since the DFRC transmitter is typically assumed to transmit only information embedded radar waveforms and not the training sequence. We propose a novel method for decoding information at the communication receiver without using training data, which is implemented using a software-defined radio (SDR). The performance of the SDR implementation is examined in terms of bit error rate (BER) as a function of signal-to-noise ratio (SNR) for differential binary and quadrature phase shift keying modulation schemes and compared with the BER versus SNR obtained with numerical simulations.

Via

Access Paper or Ask Questions

Transferable Graph Optimizers for ML Compilers

Oct 21, 2020

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter Ma, Qiumin Xu, Hanxiao Liu, Mangpo Phitchaya Phothilimtha, Shen Wang, Anna Goldie(+2 more)

Figure 1 for Transferable Graph Optimizers for ML Compilers

Figure 2 for Transferable Graph Optimizers for ML Compilers

Figure 3 for Transferable Graph Optimizers for ML Compilers

Figure 4 for Transferable Graph Optimizers for ML Compilers

Abstract:Most compilers for machine learning (ML) frameworks need to solve many correlated optimization problems to generate efficient machine code. Current ML compilers rely on heuristics based algorithms to solve these optimization problems one at a time. However, this approach is not only hard to maintain but often leads to sub-optimal solutions especially for newer model architectures. Existing learning based approaches in the literature are sample inefficient, tackle a single optimization problem, and do not generalize to unseen graphs making them infeasible to be deployed in practice. To address these limitations, we propose an end-to-end, transferable deep reinforcement learning method for computational graph optimization (GO), based on a scalable sequential attention mechanism over an inductive graph neural network. GO generates decisions on the entire graph rather than on each individual node autoregressively, drastically speeding up the search compared to prior methods. Moreover, we propose recurrent attention layers to jointly optimize dependent graph optimization tasks and demonstrate 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization. On a diverse set of representative graphs consisting of up to 80,000 nodes, including Inception-v3, Transformer-XL, and WaveNet, GO achieves on average 21% improvement over human experts and 18% improvement over the prior state of the art with 15x faster convergence, on a device placement task evaluated in real systems.

* NeurIPS 2020
* arXiv admin note: text overlap with arXiv:1910.01578

Via

Access Paper or Ask Questions

GDP: Generalized Device Placement for Dataflow Graphs

Sep 28, 2019

Yanqi Zhou, Sudip Roy, Amirali Abdolrashidi, Daniel Wong, Peter C. Ma, Qiumin Xu, Ming Zhong, Hanxiao Liu, Anna Goldie, Azalia Mirhoseini(+1 more)

Figure 1 for GDP: Generalized Device Placement for Dataflow Graphs

Figure 2 for GDP: Generalized Device Placement for Dataflow Graphs

Figure 3 for GDP: Generalized Device Placement for Dataflow Graphs

Figure 4 for GDP: Generalized Device Placement for Dataflow Graphs

Abstract:Runtime and scalability of large neural networks can be significantly affected by the placement of operations in their dataflow graphs on suitable devices. With increasingly complex neural network architectures and heterogeneous device characteristics, finding a reasonable placement is extremely challenging even for domain experts. Most existing automated device placement approaches are impractical due to the significant amount of compute required and their inability to generalize to new, previously held-out graphs. To address both limitations, we propose an efficient end-to-end method based on a scalable sequential attention mechanism over a graph neural network that is transferable to new graphs. On a diverse set of representative deep learning models, including Inception-v3, AmoebaNet, Transformer-XL, and WaveNet, our method on average achieves 16% improvement over human experts and 9.2% improvement over the prior art with 15 times faster convergence. To further reduce the computation cost, we pre-train the policy network on a set of dataflow graphs and use a superposition network to fine-tune it on each individual graph, achieving state-of-the-art performance on large hold-out graphs with over 50k nodes, such as an 8-layer GNMT.

Via

Access Paper or Ask Questions