Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Noorhaniza Wahid

Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Nov 06, 2020

Hock Hung Chieng, Noorhaniza Wahid, Pauline Ong

Figure 1 for Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Figure 2 for Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Figure 3 for Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Figure 4 for Parametric Flatten-T Swish: An Adaptive Non-linear Activation Function For Deep Learning

Abstract:Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in a performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multilinear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN- 5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.

* Journal of Information and Communication Technology, 20(1), 21-39, 2021
* 19 pages

Via

Access Paper or Ask Questions

Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Dec 15, 2018

Hock Hung Chieng, Noorhaniza Wahid, Pauline Ong, Sai Raj Kishore Perla

Figure 1 for Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Figure 2 for Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Figure 3 for Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Figure 4 for Flatten-T Swish: a thresholded ReLU-Swish-like activation function for deep learning

Abstract:Activation functions are essential for deep learning methods to learn and perform complex tasks such as image classification. Rectified Linear Unit (ReLU) has been widely used and become the default activation function across the deep learning community since 2012. Although ReLU has been popular, however, the hard zero property of the ReLU has heavily hindered the negative values from propagating through the network. Consequently, the deep neural network has not been benefited from the negative representations. In this work, an activation function called Flatten-T Swish (FTS) that leverage the benefit of the negative values is proposed. To verify its performance, this study evaluates FTS with ReLU and several recent activation functions. Each activation function is trained using MNIST dataset on five different deep fully connected neural networks (DFNNs) with depth vary from five to eight layers. For a fair evaluation, all DFNNs are using the same configuration settings. Based on the experimental results, FTS with a threshold value, T=-0.20 has the best overall performance. As compared with ReLU, FTS (T=-0.20) improves MNIST classification accuracy by 0.13%, 0.70%, 0.67%, 1.07% and 1.15% on wider 5 layers, slimmer 5 layers, 6 layers, 7 layers and 8 layers DFNNs respectively. Apart from this, the study also noticed that FTS converges twice as fast as ReLU. Although there are other existing activation functions are also evaluated, this study elects ReLU as the baseline activation function.

* International Journal of Advances in Intelligent Informatics, 4(2), 76-86

Via

Access Paper or Ask Questions