Abstract:Large Model (LM) agents, powered by large foundation models such as GPT-4 and DALL-E 2, represent a significant step towards achieving Artificial General Intelligence (AGI). LM agents exhibit key characteristics of autonomy, embodiment, and connectivity, allowing them to operate across physical, virtual, and mixed-reality environments while interacting seamlessly with humans, other agents, and their surroundings. This paper provides a comprehensive survey of the state-of-the-art in LM agents, focusing on the architecture, cooperation paradigms, security, privacy, and future prospects. Specifically, we first explore the foundational principles of LM agents, including general architecture, key components, enabling technologies, and modern applications. Then, we discuss practical collaboration paradigms from data, computation, and knowledge perspectives towards connected intelligence of LM agents. Furthermore, we systematically analyze the security vulnerabilities and privacy breaches associated with LM agents, particularly in multi-agent settings. We also explore their underlying mechanisms and review existing and potential countermeasures. Finally, we outline future research directions for building robust and secure LM agent ecosystems.
Abstract:We introduce Seed-Music, a suite of music generation systems capable of producing high-quality music with fine-grained style control. Our unified framework leverages both auto-regressive language modeling and diffusion approaches to support two key music creation workflows: \textit{controlled music generation} and \textit{post-production editing}. For controlled music generation, our system enables vocal music generation with performance controls from multi-modal inputs, including style descriptions, audio references, musical scores, and voice prompts. For post-production editing, it offers interactive tools for editing lyrics and vocal melodies directly in the generated audio. We encourage readers to listen to demo audio examples at https://team.doubao.com/seed-music .
Abstract:Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detailed expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.
Abstract:In this paper, we consider the use of structure learning methods for probabilistic graphical models to identify statistical dependencies in high-dimensional physical processes. Such processes are often synthetically characterized using PDEs (partial differential equations) and are observed in a variety of natural phenomena, including geoscience data capturing atmospheric and hydrological phenomena. Classical structure learning approaches such as the PC algorithm and variants are challenging to apply due to their high computational and sample requirements. Modern approaches, often based on sparse regression and variants, do come with finite sample guarantees, but are usually highly sensitive to the choice of hyper-parameters, e.g., parameter $\lambda$ for sparsity inducing constraint or regularization. In this paper, we present ACLIME-ADMM, an efficient two-step algorithm for adaptive structure learning, which estimates an edge specific parameter $\lambda_{ij}$ in the first step, and uses these parameters to learn the structure in the second step. Both steps of our algorithm use (inexact) ADMM to solve suitable linear programs, and all iterations can be done in closed form in an efficient block parallel manner. We compare ACLIME-ADMM with baselines on both synthetic data simulated by partial differential equations (PDEs) that model advection-diffusion processes, and real data (50 years) of daily global geopotential heights to study information flow in the atmosphere. ACLIME-ADMM is shown to be efficient, stable, and competitive, usually better than the baselines especially on difficult problems. On real data, ACLIME-ADMM recovers the underlying structure of global atmospheric circulation, including switches in wind directions at the equator and tropics entirely from the data.
Abstract:Causal discovery algorithms based on probabilistic graphical models have emerged in geoscience applications for the identification and visualization of dynamical processes. The key idea is to learn the structure of a graphical model from observed spatio-temporal data, which indicates information flow, thus pathways of interactions, in the observed physical system. Studying those pathways allows geoscientists to learn subtle details about the underlying dynamical mechanisms governing our planet. Initial studies using this approach on real-world atmospheric data have shown great potential for scientific discovery. However, in these initial studies no ground truth was available, so that the resulting graphs have been evaluated only by whether a domain expert thinks they seemed physically plausible. This paper seeks to fill this gap. We develop a testbed that emulates two dynamical processes dominant in many geoscience applications, namely advection and diffusion, in a 2D grid. Then we apply the causal discovery based information tracking algorithms to the simulation data to study how well the algorithms work for different scenarios and to gain a better understanding of the physical meaning of the graph results, in particular of instantaneous connections. We make all data sets used in this study available to the community as a benchmark. Keywords: Information flow, graphical model, structure learning, causal discovery, geoscience.
Abstract:Studies of social and group behavior in interacting organisms require high-throughput analysis of the motion of a large number of individual subjects. Computer vision techniques offer solutions to specific tracking problems, and allow automated and efficient tracking with minimal human intervention. In this work, we adopt the open active contour model to track the trajectories of moving objects at high density. We add repulsive interactions between open contours to the original model, treat the trajectories as an extrusion in the temporal dimension, and show applications to two tracking problems. The walking behavior of Drosophila is studied at different population density and gender composition. We demonstrate that individual male flies have distinct walking signatures, and that the social interaction between flies in a mixed gender arena is gender specific. We also apply our model to studies of trajectories of gliding Myxococcus xanthus bacteria at high density. We examine the individual gliding behavioral statistics in terms of the gliding speed distribution. Using these two examples at very distinctive spatial scales, we illustrate the use of our algorithm on tracking both short rigid bodies (Drosophila) and long flexible objects (Myxococcus xanthus). Our repulsive active membrane model reaches error rates better than $5\times 10^{-6}$ per fly per second for Drosophila tracking and comparable results for Myxococcus xanthus.