Abstract:When editing a video, a piece of attractive background music is indispensable. However, video background music generation tasks face several challenges, for example, the lack of suitable training datasets, and the difficulties in flexibly controlling the music generation process and sequentially aligning the video and music. In this work, we first propose a high-quality music-video dataset BGM909 with detailed annotation and shot detection to provide multi-modal information about the video and music. We then present evaluation metrics to assess music quality, including music diversity and alignment between music and video with retrieval precision metrics. Finally, we propose the Diff-BGM framework to automatically generate the background music for a given video, which uses different signals to control different aspects of the music during the generation process, i.e., uses dynamic video features to control music rhythm and semantic features to control the melody and atmosphere. We propose to align the video and music sequentially by introducing a segment-aware cross-attention layer. Experiments verify the effectiveness of our proposed method. The code and models are available at https://github.com/sizhelee/Diff-BGM.
Abstract:We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.
Abstract:A novel method for detecting faults in power grids using a graph neural network (GNN) has been developed, aimed at enhancing intelligent fault diagnosis in network operation and maintenance. This GNN-based approach identifies faulty nodes within the power grid through a specialized electrical feature extraction model coupled with a knowledge graph. Incorporating temporal data, the method leverages the status of nodes from preceding and subsequent time periods to aid in current fault detection. To validate the effectiveness of this GNN in extracting node features, a correlation analysis of the output features from each node within the neural network layer was conducted. The results from experiments show that this method can accurately locate fault nodes in simulated scenarios with a remarkable 99.53% accuracy. Additionally, the graph neural network's feature modeling allows for a qualitative examination of how faults spread across nodes, providing valuable insights for analyzing fault nodes.
Abstract:To address the challenge of identifying and understanding hidden dangers in substations from unstructured text data, a novel dynamic analysis method is proposed. This approach begins by analyzing and extracting data from the unstructured text related to hidden dangers. It then leverages a flexible, distributed data search engine built on Elastic-Search to handle this information. Following this, the hidden Markov model is employed to train the data within the engine. The Viterbi algorithm is integrated to decipher the hidden state sequences, facilitating the segmentation and labeling of entities related to hidden dangers. The final step involves using the Neo4j graph database to dynamically create a knowledge map that visualizes hidden dangers in the substation. This method's effectiveness is demonstrated through an example analysis using data from a specific substation's hidden dangers.
Abstract:Implementing precise detection of oil leaks in peak load equipment through image analysis can significantly enhance inspection quality and ensure the system's safety and reliability. However, challenges such as varying shapes of oil-stained regions, background noise, and fluctuating lighting conditions complicate the detection process. To address this, the integration of logical rule-based discrimination into image recognition has been proposed. This approach involves recognizing the spatial relationships among objects to semantically segment images of oil spills using a Mask RCNN network. The process begins with histogram equalization to enhance the original image, followed by the use of Mask RCNN to identify the preliminary positions and outlines of oil tanks, the ground, and areas of potential oil contamination. Subsequent to this identification, the spatial relationships between these objects are analyzed. Logical rules are then applied to ascertain whether the suspected areas are indeed oil spills. This method's effectiveness has been confirmed by testing on images captured from peak power equipment in the field. The results indicate that this approach can adeptly tackle the challenges in identifying oil-contaminated areas, showing a substantial improvement in accuracy compared to existing methods.
Abstract:The proposed method for linking entities in power distribution dispatch texts to a power distribution network knowledge graph is based on a deep understanding of these networks. This method leverages the unique features of entities in both the power distribution network's knowledge graph and the dispatch texts, focusing on their semantic, phonetic, and syntactic characteristics. An enhanced model, the Lexical Semantic Feature-based Skip Convolutional Neural Network (LSF-SCNN), is utilized for effectively matching dispatch text entities with those in the knowledge graph. The efficacy of this model, compared to a control model, is evaluated through cross-validation methods in real-world power distribution dispatch scenarios. The results indicate that the LSF-SCNN model excels in accurately linking a variety of entity types, demonstrating high overall accuracy in entity linking when the process is conducted in English.
Abstract:This paper introduces a robust, learning-based method for diagnosing the state of distribution network switchgear, which is crucial for maintaining the power quality for end users. Traditional diagnostic models often rely heavily on expert knowledge and lack robustness. To address this, our method incorporates an expanded feature vector that includes environmental data, temperature readings, switch position, motor operation, insulation conditions, and local discharge information. We tackle the issue of high dimensionality through feature mapping. The method introduces a decision radius to categorize unlabeled samples and updates the model parameters using a combination of supervised and unsupervised loss, along with a consistency regularization function. This approach ensures robust learning even with a limited number of labeled samples. Comparative analysis demonstrates that this method significantly outperforms existing models in both accuracy and robustness.
Abstract:The current trend of automating inspections at substations has sparked a surge in interest in the field of transformer image recognition. However, due to restrictions in the number of parameters in existing models, high-resolution images can't be directly applied, leaving significant room for enhancing recognition accuracy. Addressing this challenge, the paper introduces a novel improvement on deep self-attention networks tailored for this issue. The proposed model comprises four key components: a foundational network, a region proposal network, a module for extracting and segmenting target areas, and a final prediction network. The innovative approach of this paper differentiates itself by decoupling the processes of part localization and recognition, initially using low-resolution images for localization followed by high-resolution images for recognition. Moreover, the deep self-attention network's prediction mechanism uniquely incorporates the semantic context of images, resulting in substantially improved recognition performance. Comparative experiments validate that this method outperforms the two other prevalent target recognition models, offering a groundbreaking perspective for automating electrical equipment inspections.
Abstract:Overhead line inspection greatly benefits from defect recognition using visible light imagery. Addressing the limitations of existing feature extraction techniques and the heavy data dependency of deep learning approaches, this paper introduces a novel defect recognition framework. This is built on the Faster RCNN network and complemented by unsupervised semantic segmentation. The approach involves identifying the type and location of the target equipment, utilizing semantic segmentation to differentiate between the device and its backdrop, and finally employing similarity measures and logical rules to categorize the type of defect. Experimental results indicate that this methodology focuses more on the equipment rather than the defects when identifying issues in overhead lines. This leads to a notable enhancement in accuracy and exhibits impressive adaptability. Thus, offering a fresh perspective for automating the inspection of distribution network equipment.
Abstract:Transmission line state assessment and prediction are of great significance for the rational formulation of operation and maintenance strategy and improvement of operation and maintenance level. Aiming at the problem that existing models cannot take into account the robustness and data demand, this paper proposes a state prediction method based on semi-supervised learning. Firstly, for the expanded feature vector, the regular matrix is used to fill in the missing data, and the sparse coding problem is solved by representation learning. Then, with the help of a small number of labelled samples to initially determine the category centers of line segments in different defective states. Finally, the estimated parameters of the model are corrected using unlabeled samples. Example analysis shows that this method can improve the recognition accuracy and use data more efficiently than the existing models.