Abstract:We present a rigorous, human-in-the-loop evaluation framework for assessing the performance of AI agents on the task of Air Traffic Control, grounded in a regulator-certified simulator-based curriculum used for training and testing real-world trainee controllers. By leveraging legally regulated assessments and involving expert human instructors in the evaluation process, our framework enables a more authentic and domain-accurate measurement of AI performance. This work addresses a critical gap in the existing literature: the frequent misalignment between academic representations of Air Traffic Control and the complexities of the actual operational environment. It also lays the foundations for effective future human-machine teaming paradigms by aligning machine performance with human assessment targets.
Abstract:Digital Twins combine simulation, operational data and Artificial Intelligence (AI), and have the potential to bring significant benefits across the aviation industry. Project Bluebird, an industry-academic collaboration, has developed a probabilistic Digital Twin of en route UK airspace as an environment for training and testing AI Air Traffic Control (ATC) agents. There is a developing regulatory landscape for this kind of novel technology. Regulatory requirements are expected to be application specific, and may need to be tailored to each specific use case. We draw on emerging guidance for both Digital Twin development and the use of Artificial Intelligence/Machine Learning (AI/ML) in Air Traffic Management (ATM) to present an assurance framework. This framework defines actionable goals and the evidence required to demonstrate that a Digital Twin accurately represents its physical counterpart and also provides sufficient functionality across target use cases. It provides a structured approach for researchers to assess, understand and document the strengths and limitations of the Digital Twin, whilst also identifying areas where fidelity could be improved. Furthermore, it serves as a foundation for engagement with stakeholders and regulators, supporting discussions around the regulatory needs for future applications, and contributing to the emerging guidance through a concrete, working example of a Digital Twin. The framework leverages a methodology known as Trustworthy and Ethical Assurance (TEA) to develop an assurance case. An assurance case is a nested set of structured arguments that provides justified evidence for how a top-level goal has been realised. In this paper we provide an overview of each structured argument and a number of deep dives which elaborate in more detail upon particular arguments, including the required evidence, assumptions and justifications.
Abstract:Trajectory prediction (TP) plays an important role in supporting the decision-making of Air Traffic Controllers (ATCOs). Traditional TP methods are deterministic and physics-based, with parameters that are calibrated using aircraft surveillance data harvested across the world. These models are, therefore, agnostic to the intentions of the pilots and ATCOs, which can have a significant effect on the observed trajectory, particularly in the lateral plane. This work proposes a generative method for lateral TP, using probabilistic machine learning to model the effect of the epistemic uncertainty arising from the unknown effect of pilot behaviour and ATCO intentions. The models are trained to be specific to a particular sector, allowing local procedures such as coordinated entry and exit points to be modelled. A dataset comprising a week's worth of aircraft surveillance data, passing through a busy sector of the United Kingdom's upper airspace, was used to train and test the models. Specifically, a piecewise linear model was used as a functional, low-dimensional representation of the ground tracks, with its control points determined by a generative model conditioned on partial context. It was found that, of the investigated models, a Bayesian Neural Network using the Laplace approximation was able to generate the most plausible trajectories in order to emulate the flow of traffic through the sector.
Abstract:Accurate trajectory prediction (TP) for climbing aircraft is hampered by the presence of epistemic uncertainties concerning aircraft operation, which can lead to significant misspecification between predicted and observed trajectories. This paper proposes a generative model for climbing aircraft in which the standard Base of Aircraft Data (BADA) model is enriched by a functional correction to the thrust that is learned from data. The method offers three features: predictions of the arrival time with 66.3% less error when compared to BADA; generated trajectories that are realistic when compared to test data; and a means of computing confidence bounds for minimal computational cost.