Abstract:Foundation models have demonstrated remarkable generalization, data efficiency, and robustness properties across various domains. In this paper, we explore the feasibility of foundation models for applications in the control domain. The success of these models is enabled by large-scale pretaining on Internet-scale datasets. These are available in fields like natural language processing and computer vision, but do not exist for dynamical systems. We address this challenge by pretraining a transformer-based foundation model exclusively on synthetic data and propose to sample dynamics functions from a reproducing kernel Hilbert space. Our pretrained model generalizes for prediction tasks across different dynamical systems, which we validate in simulation and hardware experiments, including cart-pole and Furuta pendulum setups. Additionally, the model can be fine-tuned effectively to new systems to increase performance even further. Our results demonstrate the feasibility of foundation models for dynamical systems that outperform specialist models in terms of generalization, data efficiency, and robustness.
Abstract:Convolutional neural networks are being increasingly used in critical systems, where ensuring their robustness and alignment is crucial. In this context, the field of explainable artificial intelligence has proposed the generation of high-level explanations through concept extraction. These methods detect whether a concept is present in an image, but are incapable of locating where. What is more, a fair comparison of approaches is difficult, as proper validation procedures are missing. To fill these gaps, we propose a novel method for automatic concept extraction and localization based on representations obtained through the pixel-wise aggregations of activation maps of CNNs. Further, we introduce a process for the validation of concept-extraction techniques based on synthetic datasets with pixel-wise annotations of their main components, reducing human intervention. Through extensive experimentation on both synthetic and real-world datasets, our method achieves better performance in comparison to state-of-the-art alternatives.