Abstract:Convolutional Neural Networks (CNNs) continue to achieve great success in classification tasks as innovative techniques and complex multi-path architecture topologies are introduced. Neural Architecture Search (NAS) aims to automate the design of these complex architectures, reducing the need for costly manual design work by human experts. Cellular Encoding (CE) is an evolutionary computation technique which excels in constructing novel multi-path topologies of varying complexity and has recently been applied with NAS to evolve CNN architectures for various classification tasks. However, existing CE approaches have severe limitations. They are restricted to only one domain, only partially implement the theme of CE, or only focus on the micro-architecture search space. This paper introduces a new CE representation and algorithm capable of evolving novel multi-path CNN architectures of varying depth, width, and complexity for image and text classification tasks. The algorithm explicitly focuses on the macro-architecture search space. Furthermore, by using a surrogate model approach, we show that the algorithm can evolve a performant CNN architecture in less than one GPU day, thereby allowing a sufficient number of experiment runs to be conducted to achieve scientific robustness. Experiment results show that the approach is highly competitive, defeating several state-of-the-art methods, and is generalisable to both the image and text domains.
Abstract:DenseNet architectures have demonstrated impressive performance in image classification tasks, but limited research has been conducted on using character-level DenseNet (char-DenseNet) architectures for text classification tasks. It is not clear what DenseNet architectures are optimal for text classification tasks. The iterative task of designing, training and testing of char-DenseNets is an NP-Hard problem that requires expert domain knowledge. Evolutionary deep learning (EDL) has been used to automatically design CNN architectures for the image classification domain, thereby mitigating the need for expert domain knowledge. This study demonstrates the first work on using EDL to evolve char-DenseNet architectures for text classification tasks. A novel genetic programming-based algorithm (GP-Dense) coupled with an indirect-encoding scheme, facilitates the evolution of performant char DenseNet architectures. The algorithm is evaluated on two popular text datasets, and the best-evolved models are benchmarked against four current state-of-the-art character-level CNN and DenseNet models. Results indicate that the algorithm evolves performant models for both datasets that outperform two of the state-of-the-art models in terms of model accuracy and three of the state-of-the-art models in terms of parameter size.
Abstract:Character-level convolutional neural networks (char-CNN) require no knowledge of the semantic or syntactic structure of the language they classify. This property simplifies its implementation but reduces its classification accuracy. Increasing the depth of char-CNN architectures does not result in breakthrough accuracy improvements. Research has not established which char-CNN architectures are optimal for text classification tasks. Manually designing and training char-CNNs is an iterative and time-consuming process that requires expert domain knowledge. Evolutionary deep learning (EDL) techniques, including surrogate-based versions, have demonstrated success in automatically searching for performant CNN architectures for image analysis tasks. Researchers have not applied EDL techniques to search the architecture space of char-CNNs for text classification tasks. This article demonstrates the first work in evolving char-CNN architectures using a novel EDL algorithm based on genetic programming, an indirect encoding and surrogate models, to search for performant char-CNN architectures automatically. The algorithm is evaluated on eight text classification datasets and benchmarked against five manually designed CNN architecture and one long short-term memory (LSTM) architecture. Experiment results indicate that the algorithm can evolve architectures that outperform the LSTM in terms of classification accuracy and five of the manually designed CNN architectures in terms of classification accuracy and parameter count.