Abstract:In recent years, there has been a growing interest in accelerated materials innovation in both, research and industry. However, to truly add value to the development of new advanced materials, it is inevitable to take into account manufacturing processes and thereby tailor materials design approaches to support downstream process design approaches. As a major step into this direction, we present a holistic optimization approach that covers the entire materials process-structure-property chain. Our approach specifically employs machine learning techniques to address two critical identification problems. The first is to solve a materials design problem, which involves identifying near-optimal material structures that exhibit desired macroscopic properties. The second is to solve a process design problem that is to find an optimal processing path to manufacture these material structures. Both identification problems are typically ill-posed, which presents a significant challenge for solution approaches. However, the non-unique nature of these problems also offers an important advantage for processing: By having several target structures that perform similarly well, the corresponding processes can be efficiently guided towards manufacturing the best reachable structure. In particular, we apply deep reinforcement learning for process design in combination with a multi-task learning-based optimization approach for materials design. The functionality of the approach will be demonstrated by using it to manufacture crystallographic textures with desired properties in a metal forming process.
Abstract:In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast, multi-objective reinforcement learning (MORL) methods learn from vectors of per-objective rewards instead. In the case of multi-policy MORL, sets of decision policies for various preferences regarding the conflicting objectives are optimized. This is especially important when target preferences are not known during training or when preferences change dynamically during application. While it is, in general, straightforward to extend a single-objective reinforcement learning method for MORL based on linear scalarization, solutions that are reachable by these methods are limited to convex regions of the Pareto front. Non-linear MORL methods like Thresholded Lexicographic Ordering (TLO) are designed to overcome this limitation. Generalized MORL methods utilize function approximation to generalize across objective preferences and thereby implicitly learn multiple policies in a data-efficient manner, even for complex decision problems with high-dimensional or continuous state spaces. In this work, we propose \textit{generalized Thresholded Lexicographic Ordering} (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We introduce a deep reinforcement learning realization of the algorithm and present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
Abstract:The optimization along the chain processing-structure-properties-performance is one of the core objectives in data-driven materials science. In this sense, processes are supposed to manufacture workpieces with targeted material microstructures. These microstructures are defined by the material properties of interest and identifying them is a question of materials design. In the present paper, we addresse this issue and introduce a generic multi-task learning-based optimization approach. The approach enables the identification of sets of highly diverse microstructures for given desired properties and corresponding tolerances. Basically, the approach consists of an optimization algorithm that interacts with a machine learning model that combines multi-task learning with siamese neural networks. The resulting model (1) relates microstructures and properties, (2) estimates the likelihood of a microstructure of being producible, and (3) performs a distance preserving microstructure feature extraction in order to generate a lower dimensional latent feature space to enable efficient optimization. The proposed approach is applied on a crystallographic texture optimization problem for rolled steel sheets given desired properties.
Abstract:A major goal of material design is the inverse optimization of processing-structure-property relationships. In this paper, we propose and investigate a deep reinforcement learning approach for the optimization of processing paths. The goal is to find optimal processing paths in the material structure space that lead to target structures, which have been identified beforehand to yield desired material properties. The contribution completes the desired inversion of the processing-structure-property chain in a flexible and generic way. As the relation between properties and structures is generally nonunique, typically a whole set of goal structures can be identified, that lead to desired properties. Our proposed method optimizes processing paths from a start structure to one of the equivalent goal-structures. The algorithm learns to find near-optimal paths by interacting with the structure-generating process. It is guided by structure descriptors as process state features and a reward signal, which is formulated based on a distance function in the structure space. The model-free reinforcement learning algorithm learns through trial and error while interacting with the process and does not rely on a priori sampled processing data. We instantiate and evaluate the proposed method by optimizing paths of a generic metal forming process to reach near-optimal structures, which are represented by one-point statistics of crystallographic textures.
Abstract:In industrial applications of adaptive optimal control often multiple contrary objectives have to be considered. The weights (relative importance) of the objectives are often not known during the design of the control and can change with changing production conditions and requirements. In this work a novel model-free multiobjective reinforcement learning approach for adaptive optimal control of manufacturing processes is proposed. The approach enables sample-efficient learning in sequences of control configurations, given by particular objective weights.
Abstract:A self-learning optimal control algorithm for sequential manufacturing processes with time-discrete control actions is proposed and evaluated with simulated deep drawing processes. The necessary control model is built during consecutive process executions under optimal control via Reinforcement Learning, using the measured product quality as reward after each process execution. Prior model formation, which is required by state-of-the-art algorithms like Model Predictive Control and Approximate Dynamic Programming, is therefore obsolete. This avoids the difficulties in system identification and accurate modelling, which arise with processes subject to non-linear dynamics and stochastic influences. Also runtime complexity problems of these approaches are avoided, which arise when more complex models and larger control prediction horizons are employed. Instead of using pre-created process- and observation-models, Reinforcement Learning algorithms build functions of expected future reward during processing, which are then used for optimal process control decisions. The learning of such expectation functions is realized online by interacting with the process. The proposed algorithm also takes stochastic variations of the process conditions into consideration and is able to cope with partial observability. A method for the adaptive optimal control of partially observable fixed-horizon manufacturing processes, based on Q-learning is developed and studied. The resulting algorithm is instantiated and then evaluated by application to a time-stochastic optimal control problem in metal sheet deep drawing, where the experiments use FEM-simulated processes. The Reinforcement Learning based control shows superior results over the model-based Model Predictive Control and Approximate Dynamic Programming approaches.