Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boxiang Wang

Preventing output saturation in active noise control: An output-constrained Kalman filter approach

Dec 25, 2024

Junwei Ji, Dongyuan Shi, Boxiang Wang, Xiaoyi Shen, Zhengding Luo, Woon-Seng Gan

Abstract:The Kalman filter (KF)-based active noise control (ANC) system demonstrates superior tracking and faster convergence compared to the least mean square (LMS) method, particularly in dynamic noise cancellation scenarios. However, in environments with extremely high noise levels, the power of the control signal can exceed the system's rated output power due to hardware limitations, leading to output saturation and subsequent non-linearity. To mitigate this issue, a modified KF with an output constraint is proposed. In this approach, the disturbance treated as an measurement is re-scaled by a constraint factor, which is determined by the system's rated power, the secondary path gain, and the disturbance power. As a result, the output power of the system, i.e. the control signal, is indirectly constrained within the maximum output of the system, ensuring stability. Simulation results indicate that the proposed algorithm not only achieves rapid suppression of dynamic noise but also effectively prevents non-linearity due to output saturation, highlighting its practical significance.

Via

Access Paper or Ask Questions

Transferable Selective Virtual Sensing Active Noise Control Technique Based on Metric Learning

Sep 09, 2024

Boxiang Wang, Dongyuan Shi, Zhengding Luo, Xiaoyi Shen, Junwei Ji, Woon-Seng Gan

Abstract:Virtual sensing (VS) technology enables active noise control (ANC) systems to attenuate noise at virtual locations distant from the physical error microphones. Appropriate auxiliary filters (AF) can significantly enhance the effectiveness of VS approaches. The selection of appropriate AF for various types of noise can be automatically achieved using convolutional neural networks (CNNs). However, training the CNN model for different ANC systems is often labour-intensive and time-consuming. To tackle this problem, we propose a novel method, Transferable Selective VS, by integrating metric-learning technology into CNN-based VS approaches. The Transferable Selective VS method allows a pre-trained CNN to be applied directly to new ANC systems without requiring retraining, and it can handle unseen noise types. Numerical simulations demonstrate the effectiveness of the proposed method in attenuating sudden-varying broadband noises and real-world noises.

Via

Access Paper or Ask Questions

fastkqr: A Fast Algorithm for Kernel Quantile Regression

Aug 10, 2024

Qian Tang, Yuwen Gu, Boxiang Wang

Abstract:Quantile regression is a powerful tool for robust and heterogeneous learning that has seen applications in a diverse range of applied areas. However, its broader application is often hindered by the substantial computational demands arising from the non-smooth quantile loss function. In this paper, we introduce a novel algorithm named fastkqr, which significantly advances the computation of quantile regression in reproducing kernel Hilbert spaces. The core of fastkqr is a finite smoothing algorithm that magically produces exact regression quantiles, rather than approximations. To further accelerate the algorithm, we equip fastkqr with a novel spectral technique that carefully reutilizes matrix computations. In addition, we extend fastkqr to accommodate a flexible kernel quantile regression with a data-driven crossing penalty, addressing the interpretability challenges of crossing quantile curves at multiple levels. We have implemented fastkqr in a publicly available R package. Extensive simulations and real applications show that fastkqr matches the accuracy of state-of-the-art algorithms but can operate up to an order of magnitude faster.

Via

Access Paper or Ask Questions

Computation-efficient Virtual Sensing Approach with Multichannel Adjoint Least Mean Square Algorithm

May 23, 2024

Boxiang Wang, Junwei Ji, Xiaoyi Shen, Dongyuan Shi, Woon-Seng Gan

Abstract:Multichannel active noise control (ANC) systems are designed to create a large zone of quietness (ZoQ) around the error microphones, however, the placement of these microphones often presents challenges due to physical limitations. Virtual sensing technique that effectively suppresses the noise far from the physical error microphones is one of the most promising solutions. Nevertheless, the conventional multichannel virtual sensing ANC (MVANC) system based on the multichannel filtered reference least mean square (MCFxLMS) algorithm often suffers from high computational complexity. This paper proposes a feedforward MVANC system that incorporates the multichannel adjoint least mean square (MCALMS) algorithm to overcome these limitations effectively. Computational analysis demonstrates the improvement of computational efficiency and numerical simulations exhibit comparable noise reduction performance at virtual locations compared to the conventional MCFxLMS algorithm. Additionally, the effects of varied tuning noises on system performance are also investigated, providing insightful findings on optimizing MVANC systems.

Via

Access Paper or Ask Questions

Implementation of the Feedforward Multichannel Virtual Sensing Active Noise Control (MVANC) by Using MATLAB

May 17, 2024

Boxiang Wang

Abstract:The multichannel virtual sensing active noise control (MVANC) methodology is an advanced approach that may provide a wide area of silence at specific virtual positions that are distant from the physical error microphones. Currently, there is a scarcity of open-source programs available for the MVANC algorithm. This work presents a MATLAB code for the MVANC approach, utilizing the multichannel filtered-x least mean square (MCFxLMS) algorithm. The code is designed to be applicable to systems with any number of channels. The code can be found on GitHub.

Via

Access Paper or Ask Questions

The ART of Transfer Learning: An Adaptive and Robust Pipeline

Apr 30, 2023

Boxiang Wang, Yunan Wu, Chenglong Ye

Abstract:Transfer learning is an essential tool for improving the performance of primary tasks by leveraging information from auxiliary data resources. In this work, we propose Adaptive Robust Transfer Learning (ART), a flexible pipeline of performing transfer learning with generic machine learning algorithms. We establish the non-asymptotic learning theory of ART, providing a provable theoretical guarantee for achieving adaptive transfer while preventing negative transfer. Additionally, we introduce an ART-integrated-aggregating machine that produces a single final model when multiple candidate algorithms are considered. We demonstrate the promising performance of ART through extensive empirical studies on regression, classification, and sparse learning. We further present a real-data analysis for a mortality study.

Via

Access Paper or Ask Questions

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Oct 28, 2021

Zhengda Bian, Hongxin Liu, Boxiang Wang, Haichen Huang, Yongbin Li, Chuanrui Wang, Fan Cui, Yang You

Figure 1 for Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Figure 2 for Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Figure 3 for Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training

Abstract:The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. It remains a challenge for AI researchers to implement complex distributed training solutions for their models. In this paper, we introduce Colossal-AI, which is a unified parallel training system designed to seamlessly integrate different paradigms of parallelization techniques including data parallelism, pipeline parallelism, multiple tensor parallelism, and sequence parallelism. Colossal-AI aims to support the AI community to write distributed models in the same way as how they write models normally. This allows them to focus on developing the model architecture and separates the concerns of distributed training from the development process. The documentations can be found at https://www.colossalai.org and the source code can be found at https://github.com/hpcaitech/ColossalAI.

Via

Access Paper or Ask Questions

2.5-dimensional distributed model training

May 30, 2021

Boxiang Wang, Qifan Xu, Zhengda Bian, Yang You

Figure 1 for 2.5-dimensional distributed model training

Figure 2 for 2.5-dimensional distributed model training

Figure 3 for 2.5-dimensional distributed model training

Figure 4 for 2.5-dimensional distributed model training

Abstract:Data parallelism does a good job in speeding up the training. However, when it comes to the case when the memory of a single device can not host a whole model, data parallelism would not have the chance to do anything. Another option is to split the model by operator, or horizontally. Megatron-LM introduced a 1-Dimensional distributed method to use GPUs to speed up the training process. Optimus is a 2D solution for distributed tensor parallelism. However, these methods have a high communication overhead and a low scaling efficiency on large-scale computing clusters. To solve this problem, we investigate the 2.5-Dimensional distributed tensor parallelism.Introduced by Solomonik et al., 2.5-Dimensional Matrix Multiplication developed an effective method to perform multiple Cannon's algorithm at the same time to increase the efficiency. With many restrictions of Cannon's Algorithm and a huge amount of shift operation, we need to invent a new method of 2.5-dimensional matrix multiplication to enhance the performance. Absorbing the essence from both SUMMA and 2.5-Dimensional Matrix Multiplication, we introduced SUMMA2.5-LM for language models to overcome the abundance of unnecessary transmission loss result from the increasing size of language model parallelism. Compared to previous 1D and 2D model parallelization of language models, our SUMMA2.5-LM managed to reduce the transmission cost on each layer, which could get a 1.45X efficiency according to our weak scaling result between 2.5-D [4,4,4] arrangement and 2-D [8,8,1] arrangement.

Via

Access Paper or Ask Questions

Maximizing Parallelism in Distributed Training for Huge Neural Networks

May 30, 2021

Zhengda Bian, Qifan Xu, Boxiang Wang, Yang You

Figure 1 for Maximizing Parallelism in Distributed Training for Huge Neural Networks

Figure 2 for Maximizing Parallelism in Distributed Training for Huge Neural Networks

Figure 3 for Maximizing Parallelism in Distributed Training for Huge Neural Networks

Figure 4 for Maximizing Parallelism in Distributed Training for Huge Neural Networks

Abstract:The recent Natural Language Processing techniques have been refreshing the state-of-the-art performance at an incredible speed. Training huge language models is therefore an imperative demand in both industry and academy. However, huge language models impose challenges to both hardware and software. Graphical processing units (GPUs) are iterated frequently to meet the exploding demand, and a variety of ASICs like TPUs are spawned. However, there is still a tension between the fast growth of the extremely huge models and the fact that Moore's law is approaching the end. To this end, many model parallelism techniques are proposed to distribute the model parameters to multiple devices, so as to alleviate the tension on both memory and computation. Our work is the first to introduce a 3-dimensional model parallelism for expediting huge language models. By reaching a perfect load balance, our approach presents smaller memory and communication cost than existing state-of-the-art 1-D and 2-D model parallelism. Our experiments on 64 TACC's V100 GPUs show that our 3-D parallelism outperforms the 1-D and 2-D parallelism with 2.32x and 1.57x speedup, respectively.

* Technical Report of NUS HPC-AI Lab (https://ai.comp.nus.edu.sg). The leading two authors have equal contributions

Via

Access Paper or Ask Questions

Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

May 06, 2021

Tong Wang, Jingyi Yang, Yunyi Li, Boxiang Wang

Figure 1 for Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

Figure 2 for Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

Figure 3 for Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

Figure 4 for Partially Interpretable Estimators (PIE): Black-Box-Refined Interpretable Machine Learning

Abstract:We propose Partially Interpretable Estimators (PIE) which attribute a prediction to individual features via an interpretable model, while a (possibly) small part of the PIE prediction is attributed to the interaction of features via a black-box model, with the goal to boost the predictive performance while maintaining interpretability. As such, the interpretable model captures the main contributions of features, and the black-box model attempts to complement the interpretable piece by capturing the "nuances" of feature interactions as a refinement. We design an iterative training algorithm to jointly train the two types of models. Experimental results show that PIE is highly competitive to black-box models while outperforming interpretable baselines. In addition, the understandability of PIE is comparable to simple linear models as validated via a human evaluation.

Via

Access Paper or Ask Questions