Abstract:Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance SA capabilities within the complex EIoT landscape. First, we redefine the concept of digital twin (DT) within the EIoT context, aligning it with data-intensive scientific discovery paradigm (the Fourth Paradigm) so as to waken EIoT's sleeping data; this contextual redefinition lays the cornerstone of our DT-SA framework for EIoT. Then, the framework is comprehensively explored through its four fundamental steps: digitalization, simulation, informatization, and intellectualization. These steps initiate a virtual ecosystem conducive to a continuously self-adaptive, self-learning, and self-evolving big model (BM), further contributing to the evolution and effectiveness of DT-SA in engineering. Our framework is characterized by the incorporation of system theory and Fourth Paradigm as guiding ideologies, DT as data engine, and BM as intelligence engine. This unique combination forms the backbone of our approach. This work extends beyond engineering, stepping into the domain of data science -- DT-SA not only enhances management practices for EIoT users/operators, but also propels advancements in pattern analysis and machine intelligence (PAMI) within the intricate fabric of a complex system. Numerous real-world cases validate our DT-SA framework.
Abstract:In this paper, we propose a novel network, SVDFormer, to tackle two specific challenges in point cloud completion: understanding faithful global shapes from incomplete point clouds and generating high-accuracy local structures. Current methods either perceive shape patterns using only 3D coordinates or import extra images with well-calibrated intrinsic parameters to guide the geometry estimation of the missing parts. However, these approaches do not always fully leverage the cross-modal self-structures available for accurate and high-quality point cloud completion. To this end, we first design a Self-view Fusion Network that leverages multiple-view depth image information to observe incomplete self-shape and generate a compact global shape. To reveal highly detailed structures, we then introduce a refinement module, called Self-structure Dual-generator, in which we incorporate learned shape priors and geometric self-similarities for producing new points. By perceiving the incompleteness of each point, the dual-path design disentangles refinement strategies conditioned on the structural type of each point. SVDFormer absorbs the wisdom of self-structures, avoiding any additional paired information such as color images with precisely calibrated camera intrinsic parameters. Comprehensive experiments indicate that our method achieves state-of-the-art performance on widely-used benchmarks. Code will be available at https://github.com/czvvd/SVDFormer.
Abstract:Although industrial anomaly detection (AD) technology has made significant progress in recent years, generating realistic anomalies and learning priors knowledge of normal remain challenging tasks. In this study, we propose an end-to-end industrial anomaly segmentation method called FractalAD. Training samples are obtained by synthesizing fractal images and patches from normal samples. This fractal anomaly generation method is designed to sample the full morphology of anomalies. Moreover, we designed a backbone knowledge distillation structure to extract prior knowledge contained in normal samples. The differences between a teacher and a student model are converted into anomaly attention using a cosine similarity attention module. The proposed method enables an end-to-end semantic segmentation network to be used for anomaly detection without adding any trainable parameters to the backbone and segmentation head. The results of ablation studies confirmed the effectiveness of fractal anomaly generation and backbone knowledge distillation. The results of performance experiments showed that FractalAD achieved competitive results on the MVTec AD dataset compared with other state-of-the-art anomaly detection methods.
Abstract:XiaoiceSing is a singing voice synthesis (SVS) system that aims at generating 48kHz singing voices. However, the mel-spectrogram generated by it is over-smoothing in middle- and high-frequency areas due to no special design for modeling the details of these parts. In this paper, we propose XiaoiceSing2, which can generate the details of middle- and high-frequency parts to better construct the full-band mel-spectrogram. Specifically, in order to alleviate this problem, XiaoiceSing2 adopts a generative adversarial network (GAN), which consists of a FastSpeech-based generator and a multi-band discriminator. We improve the feed-forward Transformer (FFT) block by adding multiple residual convolutional blocks in parallel with the self-attention block to balance the local and global features. The multi-band discriminator contains three sub-discriminators responsible for low-, middle-, and high-frequency parts of the mel-spectrogram, respectively. Each sub-discriminator is composed of several segment discriminators (SD) and detail discriminators (DD) to distinguish the audio from different aspects. The experiment on our internal 48kHz singing voice dataset shows XiaoiceSing2 significantly improves the quality of the singing voice over XiaoiceSing.
Abstract:Entertainment-oriented singing voice synthesis (SVS) requires a vocoder to generate high-fidelity (e.g. 48kHz) audio. However, most text-to-speech (TTS) vocoders cannot work well in this scenario even if the neural vocoder for TTS has achieved significant progress. In this paper, we propose HiFi-WaveGAN which is designed for synthesizing the 48kHz high-quality singing voices from the full-band mel-spectrogram in real-time. Specifically, it consists of a generator improved from WaveNet, a multi-period discriminator same to HiFiGAN, and a multi-resolution spectrogram discriminator borrowed from UnivNet. To better reconstruct the high-frequency part from the full-band mel-spectrogram, we design a novel auxiliary spectrogram-phase loss to train the neural network, which can also accelerate the training process. The experimental result shows that our proposed HiFi-WaveGAN significantly outperforms other neural vocoders such as Parallel WaveGAN (PWG) and HiFiGAN in the mean opinion score (MOS) metric for the 48kHz SVS task. And a comparative study of HiFi-WaveGAN with/without phase loss term proves that phase loss indeed improves the training speed. Besides, we also compare the spectrogram generated by our HiFi-WaveGAN and PWG, which shows our HiFi-WaveGAN has a more powerful ability to model the high-frequency parts.
Abstract:As a kind of generative self-supervised learning methods, generative adversarial nets have been widely studied in the field of anomaly detection. However, the representation learning ability of the generator is limited since it pays too much attention to pixel-level details, and generator is difficult to learn abstract semantic representations from label prediction pretext tasks as effective as discriminator. In order to improve the representation learning ability of generator, we propose a self-supervised learning framework combining generative methods and discriminative methods. The generator no longer learns representation by reconstruction error, but the guidance of discriminator, and could benefit from pretext tasks designed for discriminative methods. Our discriminative-generative representation learning method has performance close to discriminative methods and has a great advantage in speed. Our method used in one-class anomaly detection task significantly outperforms several state-of-the-arts on multiple benchmark data sets, increases the performance of the top-performing GAN-based baseline by 6% on CIFAR-10 and 2% on MVTAD.
Abstract:The location of broken insulators in aerial images is a challenging task. This paper, focusing on the self-blast glass insulator, proposes a deep learning solution. We address the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net. A diverse aerial image set of some grid in China is tested to validated the proposed approach. Furthermore, a comparison is made among different methods and the result shows that our approach is accurate and real-time.
Abstract:Power systems are developing very fast nowadays, both in size and in complexity; this situation is a challenge for Early Event Detection (EED). This paper proposes a data- driven unsupervised learning method to handle this challenge. Specifically, the random matrix theories (RMTs) are introduced as the statistical foundations for random matrix models (RMMs); based on the RMMs, linear eigenvalue statistics (LESs) are defined via the test functions as the system indicators. By comparing the values of the LES between the experimental and the theoretical ones, the anomaly detection is conducted. Furthermore, we develop 3D power-map to visualize the LES; it provides a robust auxiliary decision-making mechanism to the operators. In this sense, the proposed method conducts EED with a pure statistical procedure, requiring no knowledge of system topologies, unit operation/control models, etc. The LES, as a key ingredient during this procedure, is a high dimensional indictor derived directly from raw data. As an unsupervised learning indicator, the LES is much more sensitive than the low dimensional indictors obtained from supervised learning. With the statistical procedure, the proposed method is universal and fast; moreover, it is robust against traditional EED challenges (such as error accumulations, spurious correlations, and even bad data in core area). Case studies, with both simulated data and real ones, validate the proposed method. To manage large-scale distributed systems, data fusion is mentioned as another data processing ingredient.