Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmed Salah

Controlling Grokking with Nonlinearity and Data Symmetry

Nov 08, 2024

Ahmed Salah, David Yevick

Figure 1 for Controlling Grokking with Nonlinearity and Data Symmetry

Figure 2 for Controlling Grokking with Nonlinearity and Data Symmetry

Figure 3 for Controlling Grokking with Nonlinearity and Data Symmetry

Figure 4 for Controlling Grokking with Nonlinearity and Data Symmetry

Abstract:This paper demonstrates that grokking behavior in modular arithmetic with a modulus P in a neural network can be controlled by modifying the profile of the activation function as well as the depth and width of the model. Plotting the even PCA projections of the weights of the last NN layer against their odd projections further yields patterns which become significantly more uniform when the nonlinearity is increased by incrementing the number of layers. These patterns can be employed to factor P when P is nonprime. Finally, a metric for the generalization ability of the network is inferred from the entropy of the layer weights while the degree of nonlinearity is related to correlations between the local entropy of the weights of the neurons in the final layer.

* 15 pages, 14 figures

Via

Access Paper or Ask Questions

Branched Variational Autoencoder Classifiers

Jan 04, 2024

Ahmed Salah, David Yevick

Abstract:This paper introduces a modified variational autoencoder (VAEs) that contains an additional neural network branch. The resulting branched VAE (BVAE) contributes a classification component based on the class labels to the total loss and therefore imparts categorical information to the latent representation. As a result, the latent space distributions of the input classes are separated and ordered, thereby enhancing the classification accuracy. The degree of improvement is quantified by numerical calculations employing the benchmark MNIST dataset for both unrotated and rotated digits. The proposed technique is then compared to and then incorporated into a VAE with fixed output distributions. This procedure is found to yield improved performance for a wide range of output distributions.

Via

Access Paper or Ask Questions