Abstract:Activation functions play a significant role in neural network design by enabling non-linearity. The choice of activation function was previously shown to influence the properties of the resulting loss landscape. Understanding the relationship between activation functions and loss landscape properties is important for neural architecture and training algorithm design. This study empirically investigates neural network loss landscapes associated with hyperbolic tangent, rectified linear unit, and exponential linear unit activation functions. Rectified linear unit is shown to yield the most convex loss landscape, and exponential linear unit is shown to yield the least flat loss landscape, and to exhibit superior generalisation performance. The presence of wide and narrow valleys in the loss landscape is established for all activation functions, and the narrow valleys are shown to correlate with saturated neurons and implicitly regularised network configurations.
Abstract:The process of training feedforward neural networks (FFNNs) can benefit from an automated process where the best heuristic to train the network is sought out automatically by means of a high-level probabilistic-based heuristic. This research introduces a novel population-based Bayesian hyper-heuristic (BHH) that is used to train feedforward neural networks (FFNNs). The performance of the BHH is compared to that of ten popular low-level heuristics, each with different search behaviours. The chosen heuristic pool consists of classic gradient-based heuristics as well as meta-heuristics (MHs). The empirical process is executed on fourteen datasets consisting of classification and regression problems with varying characteristics. The BHH is shown to be able to train FFNNs well and provide an automated method for finding the best heuristic to train the FFNNs at various stages of the training process.
Abstract:Intensive livestock production might have a negative environmental impact, by producing large amounts of animal manure, which, if not properly managed, can contaminate nearby water bodies with nutrient excess. However, if animal manure is exported to nearby crop fields, to be used as organic fertilizer, pollution can be mitigated. It is a single-objective optimization problem, in regards to finding the best solution for the logistics process of satisfying nutrient needs of crops by means of livestock manure. This paper proposes three different approaches to solve the problem: a centralized optimal algorithm (COA), a decentralized nature-inspired cooperative technique, based on the foraging behaviour of ants (AIA), as well as a naive neighbour-based method (NBS), which constitutes the existing practice used today in an ad hoc, uncoordinated manner in Catalonia. Results show that the COA approach is 8.5% more efficient than the AIA. However, the AIA approach is fairer to the farmers and more balanced in terms of average transportation distances that need to be covered by each livestock farmer, while it is 1.07 times more eefficient than the NBS. Our work constitutes the first application of a decentralized AIA to this interesting real-world problem, in a domain where swarm intelligence methods are still under-exploited.
Abstract:Intensive livestock production might have a negative environmental impact, by producing large amounts of animal excrements, which, if not properly managed, can contaminate nearby water bodies with nutrient excess. However, if animal manure is exported to distant crop fields, to be used as organic fertilizer, pollution can be mitigated. It is a single-objective optimization problem, in regards to finding the best solution for the logistics process of satisfying nutrient crops needs by means of livestock manure. This paper proposes a dynamic approach to solve the problem, based on a decentralized nature-inspired cooperative technique, inspired by the foraging behavior of ants (AIA). Results provide important insights for policy-makers over the potential of using animal manure as fertilizer for crop fields, while AIA solves the problem effectively, in a fair way to the farmers and well balanced in terms of average transportation distances that need to be covered by each livestock farmer. Our work constitutes the first application of a decentralized AIA to this interesting real-world problem, in a domain where swarm intelligence methods are still under-exploited.
Abstract:It has been argued in the past that high-dimensional neural networks do not exhibit local minima capable of trapping an optimisation algorithm. However, the relationship between loss surface modality and the neural architecture parameters, such as the number of hidden neurons per layer and the number of hidden layers, remains poorly understood. This study employs fitness landscape analysis to study the modality of neural network loss surfaces under various feed-forward architecture settings. An increase in the problem dimensionality is shown to yield a more searchable and more exploitable loss surface. An increase in the hidden layer width is shown to effectively reduce the number of local minima, and simplify the shape of the global attractor. An increase in the architecture depth is shown to sharpen the global attractor, thus making it more exploitable.
Abstract:Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.