Abstract:We study the addition of shape constraints and their consideration during the parameter estimation step of symbolic regression (SR). Shape constraints serve as a means to introduce prior knowledge about the shape of the otherwise unknown model function into SR. Unlike previous works that have explored shape constraints in SR, we propose minimizing shape constraint violations during parameter estimation using gradient-based numerical optimization. We test three algorithm variants to evaluate their performance in identifying three symbolic expressions from a synthetically generated data set. This paper examines two benchmark scenarios: one with varying noise levels and another with reduced amounts of training data. The results indicate that incorporating shape constraints into the expression search is particularly beneficial when data is scarce. Compared to using shape constraints only in the selection process, our approach of minimizing violations during parameter estimation shows a statistically significant benefit in some of our test cases, without being significantly worse in any instance.
Abstract:When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.
Abstract:Thermodynamic equations of state (EOS) are essential for many industries as well as in academia. Even leaving aside the expensive and extensive measurement campaigns required for the data acquisition, the development of EOS is an intensely time-consuming process, which does often still heavily rely on expert knowledge and iterative fine-tuning. To improve upon and accelerate the EOS development process, we introduce thermodynamics-informed symbolic regression (TiSR), a symbolic regression (SR) tool aimed at thermodynamic EOS modeling. TiSR is already a capable SR tool, which was used in the research of https://doi.org/10.1007/s10765-023-03197-z. It aims to combine an SR base with the extensions required to work with often strongly scattered experimental data, different residual pre- and post-processing options, and additional features required to consider thermodynamic EOS development. Although TiSR is not ready for end users yet, this paper is intended to report on its current state, showcase the progress, and discuss (distant and not so distant) future directions. TiSR is available at https://github.com/scoop-group/TiSR and can be cited as https://doi.org/10.5281/zenodo.8317547.