Abstract:Singing voice conversion (SVC) aims to convert a singer's voice in a given music piece to another singer while keeping the original content. We propose an end-to-end feature disentanglement-based model, which we named SaMoye, to enable zero-shot many-to-many singing voice conversion. SaMoye disentangles the features of the singing voice into content features, timbre features, and pitch features respectively. The content features are enhanced using a GPT-based model to perform cross-prediction with the phoneme of the lyrics. SaMoye can generate the music with converted voice by replacing the timbre features with the target singer. We also establish an unparalleled large-scale dataset to guarantee zero-shot performance. The dataset consists of 1500k pure singing vocal clips containing at least 10,000 singers.
Abstract:Real-time emotion-based music arrangement, which aims to transform a given music piece into another one that evokes specific emotional resonance with the user in real-time, holds significant application value in various scenarios, e.g., music therapy, video game soundtracks, and movie scores. However, balancing emotion real-time fit with soft emotion transition is a challenge due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of soft transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose SongDriver2 to address this balance. Specifically, we first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features to leverage domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, SongDriver2 surpasses the state-of-the-art methods in both objective and subjective metrics. These results demonstrate that SongDriver2 achieves real-time fit and soft transitions simultaneously, enhancing the coherence of the generated music.
Abstract:To address the problem of online automatic inspection of drug liquid bottles in production line, an implantable visual inspection system is designed and the ensemble learning algorithm for detection is proposed based on multi-features fusion. A tunnel structure is designed for visual inspection system, which allows bottles inspection to be automated without changing original
Abstract:This paper addresses design, modeling and dynamic-compensation PID (dc-PID) control of a novel type of fully-actuated aerial manipulation (AM) system. Firstly, design of novel mechanical structure of the AM is presented. Secondly, kinematics and dynamics of AM are modeled using Craig parameters and recursion Newton-Euler equations respectively, which give rise to a more accurate dynamic relationship between aerial platform and manipulator. Then, the dynamic-compensation PID control is proposed to solve the problem of fully-actuated control of AM. Finally, uniform coupled matrix equations between driving forces/moments and rotor speeds are derived, which can support design and analysis of parameters and decoupling theoretically. It is taken into account practical problems including noise and perturbation, parameter uncertainty, and power limitation in simulations, and results from simulations shows that the AM system presented can be fully-actued controlled with advanced control performances, which can not achieved theoretically in traditional AM. And with compared to backstepping control dc-PID has better control accuracy and capability to disturbance rejection in two simulations of aerial operation tasks with motion of joint. The experiment of dc-pid proves the availability and effectiveness of the method proposed.