Abstract:While tangible user interface has shown its power in naturally interacting with rigid or soft objects, users cannot conveniently use different types of granular materials as the interaction media. We introduce DipMe as a smart device to recognize the types of granular media in real time, which can be used to connect the granular materials in the physical world with various virtual content. Other than vision-based solutions, we propose a dip operation of our device and exploit the haptic signals to recognize different types of granular materials. With modern machine learning tools, we find the haptic signals from different granular media are distinguishable by DipMe. With the online granular object recognition, we build several tangible interactive applications, demonstrating the effects of DipMe in perceiving granular materials and its potential in developing a tangible user interface with granular objects as the new media.
Abstract:We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: \textit{controlled music generation} and \textit{post-production editing}. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .
Abstract:Decoding linguistic information from non-invasive brain signals using EEG has gained increasing research attention due to its vast applicational potential. Recently, a number of works have adopted a generative-based framework to decode electroencephalogram (EEG) signals into sentences by utilizing the power generative capacity of pretrained large language models (LLMs). However, this approach has several drawbacks that hinder the further development of linguistic applications for brain-computer interfaces (BCIs). Specifically, the ability of the EEG encoder to learn semantic information from EEG data remains questionable, and the LLM decoder's tendency to generate sentences based on its training memory can be hard to avoid. These issues necessitate a novel approach for converting EEG signals into sentences. In this paper, we propose a novel two-step pipeline that addresses these limitations and enhances the validity of linguistic EEG decoding research. We first confirm that word-level semantic information can be learned from EEG data recorded during natural reading by training a Conformer encoder via a masked contrastive objective for word-level classification. To achieve sentence decoding results, we employ a training-free retrieval method to retrieve sentences based on the predictions from the EEG encoder. Extensive experiments and ablation studies were conducted in this paper for a comprehensive evaluation of the proposed approach. Visualization of the top prediction candidates reveals that our model effectively groups EEG segments into semantic categories with similar meanings, thereby validating its ability to learn patterns from unspoken EEG recordings. Despite the exploratory nature of this work, these results suggest that our method holds promise for providing more reliable solutions for converting EEG signals into text.
Abstract:Due to circuit failures, defective elements that cannot adaptively adjust the phase shifts of their impinging signals in a desired manner may exist on an intelligent reflecting surface (IRS). Traditional way to find these defective IRS elements requires a thorough diagnosis of all the circuits belonging to a huge number of IRS elements, which is practically challenging. In this paper, we will devise a novel approach under which a transmitter sends known pilot signals and a receiver localizes all the defective IRS elements just based on its over-the-air measurements reflected from the IRS. The key lies in the fact that the over-the-air measurements at the receiver side are functions of the set of defective IRS elements. Based on this observation, we propose a bisection based method to localize all the defective IRS elements. Specifically, at each time slot, we properly control the desired phase shifts of all the IRS elements such that half of the considered regime that is not useful to localize the defective elements can be found based on the received signals and removed. Via numerical results, it is shown that our proposed bisection method can exploit the over-the-air measurements to localize all the defective IRS elements quickly and accurately.
Abstract:In this paper, we propose a technique for making humans in photographs protrude like reliefs. Unlike previous methods which mostly focus on the face and head, our method aims to generate art works that describe the whole body activity of the character. One challenge is that there is no ground-truth for supervised deep learning. We introduce a sigmoid variant function to manipulate gradients tactfully and train our neural networks by equipping with a loss function defined in gradient domain. The second challenge is that actual photographs often across different light conditions. We used image-based rendering technique to address this challenge and acquire rendering images and depth data under different lighting conditions. To make a clear division of labor in network modules, a two-scale architecture is proposed to create high-quality relief from a single photograph. Extensive experimental results on a variety of scenes show that our method is a highly effective solution for generating digital 2.5D artwork from photographs.
Abstract:Intelligent music generation, one of the most popular subfields of computer creativity, can lower the creative threshold for non-specialists and increase the efficiency of music creation. In the last five years, the quality of algorithm-based automatic music generation has increased significantly, motivated by the use of modern generative algorithms to learn the patterns implicit within a piece of music based on rule constraints or a musical corpus, thus generating music samples in various styles. Some of the available literature reviews lack a systematic benchmark of generative models and are traditional and conservative in their perspective, resulting in a vision of the future development of the field that is not deeply integrated with the current rapid scientific progress. In this paper, we conduct a comprehensive survey and analysis of recent intelligent music generation techniques,provide a critical discussion, explicitly identify their respective characteristics, and present them in a general table. We first introduce how music as a stream of information is encoded and the relevant datasets, then compare different types of generation algorithms, summarize their strengths and weaknesses, and discuss existing methods for evaluation. Finally, the development of artificial intelligence in composition is studied, especially by comparing the different characteristics of music generation techniques in the East and West and analyzing the development prospects in this field.
Abstract:Unsupervised self-rehabilitation exercises and physical training can cause serious injuries if performed incorrectly. We introduce a learning-based framework that identifies the mistakes made by a user and proposes corrective measures for easier and safer individual training. Our framework does not rely on hard-coded, heuristic rules. Instead, it learns them from data, which facilitates its adaptation to specific user needs. To this end, we use a Graph Convolutional Network (GCN) architecture acting on the user's pose sequence to model the relationship between the body joints trajectories. To evaluate our approach, we introduce a dataset with 3 different physical exercises. Our approach yields 90.9% mistake identification accuracy and successfully corrects 94.2% of the mistakes.
Abstract:The trajectory prediction is a critical and challenging problem in the design of an autonomous driving system. Many AI-oriented companies, such as Google Waymo, Uber and DiDi, are investigating more accurate vehicle trajectory prediction algorithms. However, the prediction performance is governed by lots of entangled factors, such as the stochastic behaviors of surrounding vehicles, historical information of self-trajectory, and relative positions of neighbors, etc. In this paper, we propose a novel graph-based information sharing network (GISNet) that allows the information sharing between the target vehicle and its surrounding vehicles. Meanwhile, the model encodes the historical trajectory information of all the vehicles in the scene. Experiments are carried out on the public NGSIM US-101 and I-80 Dataset and the prediction performance is measured by the Root Mean Square Error (RMSE). The quantitative and qualitative experimental results show that our model significantly improves the trajectory prediction accuracy, by up to 50.00%, compared to existing models.
Abstract:The number of daily sUAS operations in uncontrolled low altitude airspace is expected to reach into the millions in a few years. Therefore, UAS density prediction has become an emerging and challenging problem. In this paper, a deep learning-based UAS instantaneous density prediction model is presented. The model takes two types of data as input: 1) the historical density generated from the historical data, and 2) the future sUAS mission information. The architecture of our model contains four components: Historical Density Formulation module, UAS Mission Translation module, Mission Feature Extraction module, and Density Map Projection module. The training and testing data are generated by a python based simulator which is inspired by the multi-agent air traffic resource usage simulator (MATRUS) framework. The quality of prediction is measured by the correlation score and the Area Under the Receiver Operating Characteristics (AUROC) between the predicted value and simulated value. The experimental results demonstrate outstanding performance of the deep learning-based UAS density predictor. Compared to the baseline models, for simplified traffic scenario where no-fly zones and safe distance among sUASs are not considered, our model improves the prediction accuracy by more than 15.2% and its correlation score reaches 0.947. In a more realistic scenario, where the no-fly zone avoidance and the safe distance among sUASs are maintained using A* routing algorithm, our model can still achieve 0.823 correlation score. Meanwhile, the AUROC can reach 0.951 for the hot spot prediction.
Abstract:The recent discovered spatial-temporal information processing capability of bio-inspired Spiking neural networks (SNN) has enabled some interesting models and applications. However designing large-scale and high-performance model is yet a challenge due to the lack of robust training algorithms. A bio-plausible SNN model with spatial-temporal property is a complex dynamic system. Each synapse and neuron behave as filters capable of preserving temporal information. As such neuron dynamics and filter effects are ignored in existing training algorithms, the SNN downgrades into a memoryless system and loses the ability of temporal signal processing. Furthermore, spike timing plays an important role in information representation, but conventional rate-based spike coding models only consider spike trains statistically, and discard information carried by its temporal structures. To address the above issues, and exploit the temporal dynamics of SNNs, we formulate SNN as a network of infinite impulse response (IIR) filters with neuron nonlinearity. We proposed a training algorithm that is capable to learn spatial-temporal patterns by searching for the optimal synapse filter kernels and weights. The proposed model and training algorithm are applied to construct associative memories and classifiers for synthetic and public datasets including MNIST, NMNIST, DVS 128 etc.; and their accuracy outperforms state-of-art approaches.