Abstract:Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
Abstract:This research studies the network traffic signal control problem. It uses the Lyapunov control function to derive the back pressure method, which is equal to differential queue lengths weighted by intersection lane flows. Lyapunov control theory is a platform that unifies several current theories for intersection signal control. We further use the theorem to derive the flow-based and other pressure-based signal control algorithms. For example, the Dynamic, Optimal, Real-time Algorithm for Signals (DORAS) algorithm may be derived by defining the Lyapunov function as the sum of queue length. The study then utilizes the back pressure as a reward in the reinforcement learning (RL) based network signal control, whose agent is trained with double Deep Q-Network (Double-DQN). The proposed algorithm is compared with several traditional and RL-based methods under passenger traffic flow and mixed flow with freight traffic, respectively. The numerical tests are conducted on a single corridor and on a local grid network under three traffic demand scenarios of low, medium, and heavy traffic, respectively. The numerical simulation demonstrates that the proposed algorithm outperforms the others in terms of the average vehicle waiting time on the network.