Baoyuan
Abstract:This manuscript signals a new era in the integration of artificial intelligence with software engineering, placing machines at the pinnacle of coding capability. We present a formalized, iterative methodology proving that AI can fully replace human programmers in all aspects of code creation and refinement. Our approach, combining large language models with formal verification, test-driven development, and incremental architectural guidance, achieves a 38.6% improvement over the current top performer's 48.33% accuracy on the SWE-bench benchmark. This surpasses previously assumed limits, signaling the end of human-exclusive coding and the rise of autonomous AI-driven software innovation. More than a technical advance, our work challenges centuries-old assumptions about human creativity. We provide robust evidence of AI superiority, demonstrating tangible gains in practical engineering contexts and laying the foundation for a future in which computational creativity outpaces human ingenuity.
Abstract:Positive-Unlabelled (PU) learning is a growing area of machine learning that aims to learn classifiers from data consisting of labelled positive and unlabelled instances. Whilst much work has been done proposing methods for PU learning, little has been written on the subject of evaluating these methods. Many popular standard classification metrics cannot be precisely calculated due to the absence of fully labelled data, so alternative approaches must be taken. This short commentary paper critically reviews the main PU learning evaluation approaches and the choice of predictive accuracy measures in 51 articles proposing PU classifiers and provides practical recommendations for improvements in this area.
Abstract:Swarm robotics systems are envisioned to become an important component of both academic research and real-world applications. However, in order to reach widespread adoption, new models that ensure the secure cooperation of these systems need to be developed. This work proposes a novel model to encapsulate cooperative robotic missions in Merkle trees, one of the fundamental components of blockchain technology. With the proposed model, swarm operators can provide the "blueprint" of the swarm's mission without disclosing raw data about the mission itself. In other words, data verification can be separated from data itself. We propose a system where swarm robots have to "prove" their integrity to their peers by exchanging cryptographic proofs. This work analyzes and tests the proposed approach for two different robotic missions: foraging (where robots modify the environment) and maze formation (where robots become part of the environment). In both missions, robots were able to cooperate and carry out sequential operations in the correct order without having explicit knowledge about the mission's high-level goals or objectives. The performance, communication costs, and information diversity requirements for the proposed approach are analyzed. Finally, conclusions are drawn and future work directions are suggested.
Abstract:Research has proven that stress reduces quality of life and causes many diseases. For this reason, several researchers devised stress detection systems based on physiological parameters. However, these systems require that obtrusive sensors are continuously carried by the user. In our paper, we propose an alternative approach providing evidence that daily stress can be reliably recognized based on behavioral metrics, derived from the user's mobile phone activity and from additional indicators, such as the weather conditions (data pertaining to transitory properties of the environment) and the personality traits (data concerning permanent dispositions of individuals). Our multifactorial statistical model, which is person-independent, obtains the accuracy score of 72.28% for a 2-class daily stress recognition problem. The model is efficient to implement for most of multimedia applications due to highly reduced low-dimensional feature space (32d). Moreover, we identify and discuss the indicators which have strong predictive power.