Abstract:Learning the graph topology of a complex network is challenging due to limited data availability and imprecise data models. A common remedy in existing works is to incorporate priors such as sparsity or modularity which highlight on the structural property of graph topology. We depart from these approaches to develop priors that are directly inspired by complex network dynamics. Focusing on social networks with actions modeled by equilibriums of linear quadratic games, we postulate that the social network topologies are optimized with respect to a social welfare function. Utilizing this prior knowledge, we propose a network games induced regularizer to assist graph learning. We then formulate the graph topology learning problem as a bilevel program. We develop a two-timescale gradient algorithm to tackle the latter. We draw theoretical insights on the optimal graph structure of the bilevel program and show that they agree with the topology in several man-made networks. Empirically, we demonstrate the proposed formulation gives rise to reliable estimate of graph topology.
Abstract:Previous Multimodal Information based Speech Processing (MISP) challenges mainly focused on audio-visual speech recognition (AVSR) with commendable success. However, the most advanced back-end recognition systems often hit performance limits due to the complex acoustic environments. This has prompted a shift in focus towards the Audio-Visual Target Speaker Extraction (AVTSE) task for the MISP 2023 challenge in ICASSP 2024 Signal Processing Grand Challenges. Unlike existing audio-visual speech enhance-ment challenges primarily focused on simulation data, the MISP 2023 challenge uniquely explores how front-end speech processing, combined with visual clues, impacts back-end tasks in real-world scenarios. This pioneering effort aims to set the first benchmark for the AVTSE task, offering fresh insights into enhancing the ac-curacy of back-end speech recognition systems through AVTSE in challenging and real acoustic environments. This paper delivers a thorough overview of the task setting, dataset, and baseline system of the MISP 2023 challenge. It also includes an in-depth analysis of the challenges participants may encounter. The experimental results highlight the demanding nature of this task, and we look forward to the innovative solutions participants will bring forward.
Abstract:This paper proposes a blind detection problem for low pass graph signals. Without assuming knowledge of the graph topology in advance, we aim to detect if a set of graph signal observations are generated from a low pass graph filter. Our problem is motivated by the widely adopted assumption of low pass (a.k.a.~smooth) signals required by many existing works in graph signal processing (GSP), as well as the longstanding problem of network dynamics identification. Focusing on detecting low pass graph signals whose cutoff frequency coincides with the number of clusters present, our key idea is to develop blind detector leveraging the unique spectral pattern exhibited by low pass graph signals. We analyze the sample complexity of these detectors considering the effects of graph filter's properties, random delays. We show novel applications of the blind detector on robustifying graph learning, identifying antagonistic ties in opinion dynamics, and detecting anomalies in power systems. Numerical experiments validate our findings.
Abstract:This paper considers learning a product graph from multi-attribute graph signals. Our work is motivated by the widespread presence of multilayer networks that feature interactions within and across graph layers. Focusing on a product graph setting with homogeneous layers, we propose a bivariate polynomial graph filter model. We then consider the topology inference problems thru adapting existing spectral methods. We propose two solutions for the required spectral estimation step: a simplified solution via unfolding the multi-attribute data into matrices, and an exact solution via nearest Kronecker product decomposition (NKD). Interestingly, we show that strong inter-layer coupling can degrade the performance of the unfolding solution while the NKD solution is robust to inter-layer coupling effects. Numerical experiments show efficacy of our methods.
Abstract:Weakly Supervised Semantic Segmentation (WSSS) based on image-level labels has been greatly advanced by exploiting the outputs of Class Activation Map (CAM) to generate the pseudo labels for semantic segmentation. However, CAM merely discovers seeds from a small number of regions, which may be insufficient to serve as pseudo masks for semantic segmentation. In this paper, we formulate the expansion of object regions in CAM as an increase in information. From the perspective of information theory, we propose a novel Complementary Patch (CP) Representation and prove that the information of the sum of the CAMs by a pair of input images with complementary hidden (patched) parts, namely CP Pair, is greater than or equal to the information of the baseline CAM. Therefore, a CAM with more information related to object seeds can be obtained by narrowing down the gap between the sum of CAMs generated by the CP Pair and the original CAM. We propose a CP Network (CPN) implemented by a triplet network and three regularization functions. To further improve the quality of the CAMs, we propose a Pixel-Region Correlation Module (PRCM) to augment the contextual information by using object-region relations between the feature maps and the CAMs. Experimental results on the PASCAL VOC 2012 datasets show that our proposed method achieves a new state-of-the-art in WSSS, validating the effectiveness of our CP Representation and CPN.