Sid
Abstract:Musical mode is one of the most critical element that establishes the framework of pitch organization and determines the harmonic relationships. Previous works often use the simplistic and rigid alignment method, and overlook the diversity of modes. However, in contrast to AI models, humans possess cognitive mechanisms for perceiving the various modes and keys. In this paper, we propose a spiking neural network inspired by brain mechanisms and psychological theories to represent musical modes and keys, ultimately generating musical pieces that incorporate tonality features. Specifically, the contributions are detailed as follows: 1) The model is designed with multiple collaborated subsystems inspired by the structures and functions of corresponding brain regions; 2)We incorporate mechanisms for neural circuit evolutionary learning that enable the network to learn and generate mode-related features in music, reflecting the cognitive processes involved in human music perception. 3)The results demonstrate that the proposed model shows a connection framework closely similar to the Krumhansl-Schmuckler model, which is one of the most significant key perception models in the music psychology domain. 4) Experiments show that the model can generate music pieces with characteristics of the given modes and keys. Additionally, the quantitative assessments of generated pieces reveals that the generating music pieces have both tonality characteristics and the melodic adaptability needed to generate diverse and musical content. By combining insights from neuroscience, psychology, and music theory with advanced neural network architectures, our research aims to create a system that not only learns and generates music but also bridges the gap between human cognition and artificial intelligence.
Abstract:Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.
Abstract:Remote photoplethysmography (rPPG) has gained significant attention in recent years for its ability to extract physiological signals from facial videos. While existing rPPG measurement methods have shown satisfactory performance in intra-dataset and cross-dataset scenarios, they often overlook the incremental learning scenario, where training data is presented sequentially, resulting in the issue of catastrophic forgetting. Meanwhile, most existing class incremental learning approaches are unsuitable for rPPG measurement. In this paper, we present a novel method named ADDP to tackle continual learning for rPPG measurement. We first employ adapter to efficiently finetune the model on new tasks. Then we design domain prototypes that are more applicable to rPPG signal regression than commonly used class prototypes. Based on these prototypes, we propose a feature augmentation strategy to consolidate the past knowledge and an inference simplification strategy to convert potentially forgotten tasks into familiar ones for the model. To evaluate ADDP and enable fair comparisons, we create the first continual learning protocol for rPPG measurement. Comprehensive experiments demonstrate the effectiveness of our method for rPPG continual learning. Source code is available at \url{https://github.com/MayYoY/rPPGDIL}
Abstract:To address challenges in the digital economy's landscape of digital intelligence, large language models (LLMs) have been developed. Improvements in computational power and available resources have significantly advanced LLMs, allowing their integration into diverse domains for human life. Medical LLMs are essential application tools with potential across various medical scenarios. In this paper, we review LLM developments, focusing on the requirements and applications of medical LLMs. We provide a concise overview of existing models, aiming to explore advanced research directions and benefit researchers for future medical applications. We emphasize the advantages of medical LLMs in applications, as well as the challenges encountered during their development. Finally, we suggest directions for technical integration to mitigate challenges and potential research directions for the future of medical LLMs, aiming to meet the demands of the medical field better.
Abstract:Spiking neural networks (SNNs) have attracted extensive attentions in Brain-inspired Artificial Intelligence and computational neuroscience. They can be used to simulate biological information processing in the brain at multiple scales. More importantly, SNNs serve as an appropriate level of abstraction to bring inspirations from brain and cognition to Artificial Intelligence. In this paper, we present the Brain-inspired Cognitive Intelligence Engine (BrainCog) for creating brain-inspired AI and brain simulation models. BrainCog incorporates different types of spiking neuron models, learning rules, brain areas, etc., as essential modules provided by the platform. Based on these easy-to-use modules, BrainCog supports various brain-inspired cognitive functions, including Perception and Learning, Decision Making, Knowledge Representation and Reasoning, Motor Control, and Social Cognition. These brain-inspired AI models have been effectively validated on various supervised, unsupervised, and reinforcement learning tasks, and they can be used to enable AI models to be with multiple brain-inspired cognitive functions. For brain simulation, BrainCog realizes the function simulation of decision-making, working memory, the structure simulation of the Neural Circuit, and whole brain structure simulation of Mouse brain, Macaque brain, and Human brain. An AI engine named BORN is developed based on BrainCog, and it demonstrates how the components of BrainCog can be integrated and used to build AI models and applications. To enable the scientific quest to decode the nature of biological intelligence and create AI, BrainCog aims to provide essential and easy-to-use building blocks, and infrastructural support to develop brain-inspired spiking neural network based AI, and to simulate the cognitive brains at multiple scales. The online repository of BrainCog can be found at https://github.com/braincog-x.