Abstract:Melody stuck in your head, also known as "earworm", is tough to get rid of, unless you listen to it again or sing it out loud. But what if you can not find the name of that song? It must be an intolerable feeling. Recognizing a song name base on humming sound is not an easy task for a human being and should be done by machines. However, there is no research paper published about hum tune recognition. Adapting from Hum2Song Zalo AI Challenge 2021 - a competition about querying the name of a song by user's giving humming tune, which is similar to Google's Hum to Search. This paper covers details about the pre-processed data from the original type (mp3) to usable form for training and inference. In training an embedding model for the feature extraction phase, we ran experiments with some states of the art, such as ResNet, VGG, AlexNet, MobileNetV2. And for the inference phase, we use the Faiss module to effectively search for a song that matched the sequence of humming sound. The result comes at nearly 94\% in MRR@10 metric on the public test set, along with the top 1 result on the public leaderboard.
Abstract:Multiplicative noise, also known as speckle or pepper noise, commonly affects images produced by synthetic aperture radar (SAR), lasers, or optical lenses. Unlike additive noise, which typically arises from thermal processes or external factors, multiplicative noise is inherent to the system, originating from the fluctuation in diffuse reflections. These fluctuations result in multiple copies of the same signal with varying magnitudes being combined. Consequently, despeckling, or removing multiplicative noise, necessitates different techniques compared to those used for additive noise removal. In this paper, we propose a novel approach using Stochastic Differential Equations based diffusion models to address multiplicative noise. We demonstrate that multiplicative noise can be effectively modeled as a Geometric Brownian Motion process in the logarithmic domain. Utilizing the Fokker-Planck equation, we derive the corresponding reverse process for image denoising. To validate our method, we conduct extensive experiments on two different datasets, comparing our approach to both classical signal processing techniques and contemporary CNN-based noise removal models. Our results indicate that the proposed method significantly outperforms existing methods on perception-based metrics such as FID and LPIPS, while maintaining competitive performance on traditional metrics like PSNR and SSIM.
Abstract:Federated Learning (FL) is a distributed machine learning approach that maintains data privacy by training on decentralized data sources. Similar to centralized machine learning, FL is also susceptible to backdoor attacks. Most backdoor attacks in FL assume a predefined target class and require control over a large number of clients or knowledge of benign clients' information. Furthermore, they are not imperceptible and are easily detected by human inspection due to clear artifacts left on the poison data. To overcome these challenges, we propose Venomancer, an effective backdoor attack that is imperceptible and allows target-on-demand. Specifically, imperceptibility is achieved by using a visual loss function to make the poison data visually indistinguishable from the original data. Target-on-demand property allows the attacker to choose arbitrary target classes via conditional adversarial training. Additionally, experiments showed that the method is robust against state-of-the-art defenses such as Norm Clipping, Weak DP, Krum, and Multi-Krum. The source code is available at https://anonymous.4open.science/r/Venomancer-3426.
Abstract:We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.