Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arshdeep Singh

Scalable Substructure Discovery Algorithm For Homogeneous Multilayer Networks

Apr 27, 2025

Arshdeep Singh, Abhishek Santra, Sharma Chakravarthy

Abstract:Graph mining analyzes real-world graphs to find core substructures (connected subgraphs) in applications modeled as graphs. Substructure discovery is a process that involves identifying meaningful patterns, structures, or components within a large data set. These substructures can be of various types, such as frequent patterns, motifs, or other relevant features within the data. To model complex data sets -- with multiple types of entities and relationships -- multilayer networks (or MLNs) have been shown to be more effective as compared to simple and attributed graphs. Analysis algorithms on MLNs using the decoupling approach have been shown to be both efficient and accurate. Hence, this paper focuses on substructure discovery in homogeneous multilayer networks (one type of MLN) using a novel decoupling-based approach. In this approach, each layer is processed independently, and then the results from two or more layers are composed to identify substructures in the entire MLN. The algorithm is designed and implemented, including the composition part, using one of the distributed processing frameworks (the Map/Reduce paradigm) to provide scalability. After establishing the correctness, we analyze the speedup and response time of the proposed algorithm and approach through extensive experimental analysis on large synthetic and real-world data sets with diverse graph characteristics.

Via

Access Paper or Ask Questions

The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Sep 17, 2024

Gabriel Bibbó, Thomas Deacon, Arshdeep Singh, Mark D. Plumbley

Figure 1 for The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Figure 2 for The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Figure 3 for The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Figure 4 for The Sounds of Home: A Speech-Removed Residential Audio Dataset for Sound Event Detection

Abstract:This paper presents a residential audio dataset to support sound event detection research for smart home applications aimed at promoting wellbeing for older adults. The dataset is constructed by deploying audio recording systems in the homes of 8 participants aged 55-80 years for a 7-day period. Acoustic characteristics are documented through detailed floor plans and construction material information to enable replication of the recording environments for AI model deployment. A novel automated speech removal pipeline is developed, using pre-trained audio neural networks to detect and remove segments containing spoken voice, while preserving segments containing other sound events. The resulting dataset consists of privacy-compliant audio recordings that accurately capture the soundscapes and activities of daily living within residential spaces. The paper details the dataset creation methodology, the speech removal pipeline utilizing cascaded model architectures, and an analysis of the vocal label distribution to validate the speech removal process. This dataset enables the development and benchmarking of sound event detection models tailored specifically for in-home applications.

Via

Access Paper or Ask Questions

Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Jul 23, 2024

Rhys Burchett-Vass, Arshdeep Singh, Gabriel Bibbó, Mark D. Plumbley

Figure 1 for Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Figure 2 for Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Figure 3 for Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Figure 4 for Integrating IP Broadcasting with Audio Tags: Workflow and Challenges

Abstract:The broadcasting industry is increasingly adopting IP techniques, revolutionising both live and pre-recorded content production, from news gathering to live music events. IP broadcasting allows for the transport of audio and video signals in an easily configurable way, aligning with modern networking techniques. This shift towards an IP workflow allows for much greater flexibility, not only in routing signals but with the integration of tools using standard web development techniques. One possible tool could include the use of live audio tagging, which has a number of uses in the production of content. These include from automated closed captioning to identifying unwanted sound events within a scene. In this paper, we describe the process of containerising an audio tagging model into a microservice, a small segregated code module that can be integrated into a multitude of different network setups. The goal is to develop a modular, accessible, and flexible tool capable of seamless deployment into broadcasting workflows of all sizes, from small productions to large corporations. Challenges surrounding latency of the selected audio tagging model and its effect on the usefulness of the end product are discussed.

* Submitted to DCASE 2024 Workshop

Via

Access Paper or Ask Questions

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Oct 30, 2023

Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy

Figure 1 for Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Figure 2 for Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Figure 3 for Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Figure 4 for Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Abstract:Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.

* 4 pages, International Workshop on Personalized Generative AI (@CIKM 2023)

Via

Access Paper or Ask Questions

Audio Tagging on an Embedded Hardware Platform

Jun 15, 2023

Gabriel Bibbo, Arshdeep Singh, Mark D. Plumbley

Abstract:Convolutional neural networks (CNNs) have exhibited state-of-the-art performance in various audio classification tasks. However, their real-time deployment remains a challenge on resource-constrained devices like embedded systems. In this paper, we analyze how the performance of large-scale pretrained audio neural networks designed for audio pattern recognition changes when deployed on a hardware such as Raspberry Pi. We empirically study the role of CPU temperature, microphone quality and audio signal volume on performance. Our experiments reveal that the continuous CPU usage results in an increased temperature that can trigger an automated slowdown mechanism in the Raspberry Pi, impacting inference latency. The quality of a microphone, specifically with affordable devices like the Google AIY Voice Kit, and audio signal volume, all affect the system performance. In the course of our investigation, we encounter substantial complications linked to library compatibility and the unique processor architecture requirements of the Raspberry Pi, making the process less straightforward compared to conventional computers (PCs). Our observations, while presenting challenges, pave the way for future researchers to develop more compact machine learning models, design heat-dissipative hardware, and select appropriate microphones when AI models are deployed for real-time applications on edge devices. All related assets and an interactive demo can be found on GitHub

* Submitted to DCASE 2023 Workshop

Via

Access Paper or Ask Questions

E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks

May 30, 2023

Arshdeep Singh, Haohe Liu, Mark D. Plumbley

Abstract:Sounds carry an abundance of information about activities and events in our everyday environment, such as traffic noise, road works, music, or people talking. Recent machine learning methods, such as convolutional neural networks (CNNs), have been shown to be able to automatically recognize sound activities, a task known as audio tagging. One such method, pre-trained audio neural networks (PANNs), provides a neural network which has been pre-trained on over 500 sound classes from the publicly available AudioSet dataset, and can be used as a baseline or starting point for other tasks. However, the existing PANNs model has a high computational complexity and large storage requirement. This could limit the potential for deploying PANNs on resource-constrained devices, such as on-the-edge sound sensors, and could lead to high energy consumption if many such devices were deployed. In this paper, we reduce the computational complexity and memory requirement of the PANNs model by taking a pruning approach to eliminate redundant parameters from the PANNs model. The resulting Efficient PANNs (E-PANNs) model, which requires 36\% less computations and 70\% less memory, also slightly improves the sound recognition (audio tagging) performance. The code for the E-PANNs model has been released under an open source license.

* Accepted in Internoise 2023 conference

Via

Access Paper or Ask Questions

Compressing audio CNNs with graph centrality based filter pruning

May 05, 2023

James A King, Arshdeep Singh, Mark D. Plumbley

Figure 1 for Compressing audio CNNs with graph centrality based filter pruning

Figure 2 for Compressing audio CNNs with graph centrality based filter pruning

Figure 3 for Compressing audio CNNs with graph centrality based filter pruning

Figure 4 for Compressing audio CNNs with graph centrality based filter pruning

Abstract:Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems, such as audio classification. CNNs have many parameters and filters, with some having a larger impact on the performance than others. This means that networks may contain many unnecessary filters, increasing a CNN's computation and memory requirements while providing limited performance benefits. To make CNNs more efficient, we propose a pruning framework that eliminates filters with the highest "commonality". We measure this commonality using the graph-theoretic concept of "centrality". We hypothesise that a filter with a high centrality should be eliminated as it represents commonality and can be replaced by other filters without affecting the performance of a network much. An experimental evaluation of the proposed framework is performed on acoustic scene classification and audio tagging. On the DCASE 2021 Task 1A baseline network, our proposed method reduces computations per inference by 71\% with 50\% fewer parameters at less than a two percentage point drop in accuracy compared to the original network. For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.

Via

Access Paper or Ask Questions

Efficient CNNs via Passive Filter Pruning

Apr 05, 2023

Arshdeep Singh, Mark D. Plumbley

Figure 1 for Efficient CNNs via Passive Filter Pruning

Figure 2 for Efficient CNNs via Passive Filter Pruning

Figure 3 for Efficient CNNs via Passive Filter Pruning

Figure 4 for Efficient CNNs via Passive Filter Pruning

Abstract:Convolutional neural networks (CNNs) have shown state-of-the-art performance in various applications. However, CNNs are resource-hungry due to their requirement of high computational complexity and memory storage. Recent efforts toward achieving computational efficiency in CNNs involve filter pruning methods that eliminate some of the filters in CNNs based on the \enquote{importance} of the filters. The majority of existing filter pruning methods are either "active", which use a dataset and generate feature maps to quantify filter importance, or "passive", which compute filter importance using entry-wise norm of the filters without involving data. Under a high pruning ratio where large number of filters are to be pruned from the network, the entry-wise norm methods eliminate relatively smaller norm filters without considering the significance of the filters in producing the node output, resulting in degradation in the performance. To address this, we present a passive filter pruning method where the filters are pruned based on their contribution in producing output by considering the operator norm of the filters. The proposed pruning method generalizes better across various CNNs compared to that of the entry-wise norm-based pruning methods. In comparison to the existing active filter pruning methods, the proposed pruning method is at least 4.5 times faster in computing filter importance and is able to achieve similar performance compared to that of the active filter pruning methods. The efficacy of the proposed pruning method is evaluated on audio scene classification and image classification using various CNNs architecture such as VGGish, DCASE21_Net, VGG-16 and ResNet-50.

Via

Access Paper or Ask Questions

Efficient Similarity-based Passive Filter Pruning for Compressing CNNs

Oct 27, 2022

Arshdeep Singh, Mark D. Plumbley

Abstract:Convolution neural networks (CNNs) have shown great success in various applications. However, the computational complexity and memory storage of CNNs is a bottleneck for their deployment on resource-constrained devices. Recent efforts towards reducing the computation cost and the memory overhead of CNNs involve similarity-based passive filter pruning methods. Similarity-based passive filter pruning methods compute a pairwise similarity matrix for the filters and eliminate a few similar filters to obtain a small pruned CNN. However, the computational complexity of computing the pairwise similarity matrix is high, particularly when a convolutional layer has many filters. To reduce the computational complexity in obtaining the pairwise similarity matrix, we propose to use an efficient method where the complete pairwise similarity matrix is approximated from only a few of its columns by using a Nystr\"om approximation method. The proposed efficient similarity-based passive filter pruning method is 3 times faster and gives same accuracy at the same reduction in computations for CNNs compared to that of the similarity-based pruning method that computes a complete pairwise similarity matrix. Apart from this, the proposed efficient similarity-based pruning method performs similarly or better than the existing norm-based pruning methods. The efficacy of the proposed pruning method is evaluated on CNNs such as DCASE 2021 Task 1A baseline network and a VGGish network designed for acoustic scene classification.

* Submitted to ICASSP 2023

Via

Access Paper or Ask Questions

Low-complexity CNNs for Acoustic Scene Classification

Aug 02, 2022

Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang, Mark D. Plumbley

Figure 1 for Low-complexity CNNs for Acoustic Scene Classification

Figure 2 for Low-complexity CNNs for Acoustic Scene Classification

Figure 3 for Low-complexity CNNs for Acoustic Scene Classification

Figure 4 for Low-complexity CNNs for Acoustic Scene Classification

Abstract:This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present low-complexity systems for ASC that follow the rules intended for the task.

* Technical Report DCASE 2022 TASK 1. arXiv admin note: substantial text overlap with arXiv:2207.11529

Via

Access Paper or Ask Questions