Abstract:This study proves the two-phase dynamics of a deep neural network (DNN) learning interactions. Despite the long disappointing view of the faithfulness of post-hoc explanation of a DNN, in recent years, a series of theorems have been proven to show that given an input sample, a small number of interactions between input variables can be considered as primitive inference patterns, which can faithfully represent every detailed inference logic of the DNN on this sample. Particularly, it has been observed that various DNNs all learn interactions of different complexities with two-phase dynamics, and this well explains how a DNN's generalization power changes from under-fitting to over-fitting. Therefore, in this study, we prove the dynamics of a DNN gradually encoding interactions of different complexities, which provides a theoretically grounded mechanism for the over-fitting of a DNN. Experiments show that our theory well predicts the real learning dynamics of various DNNs on different tasks.
Abstract:Two important problems in the field of Topological Data Analysis are defining practical multifiltrations on objects and showing ability of TDA to detect the geometry. Motivated by the problems, we constuct three multifiltrations named multi-GENEO, multi-DGENEO and mix-GENEO, and prove the stability of both the interleaving distance and multiparameter persistence landscape of multi-GENEO with respect to the pseudometric of the subspace of bounded functions. We also give the estimations of upper bound for multi-DGENEO and mix-GENEO. Finally, we provide experiment results on MNIST dataset to demonstrate our bifiltrations have ability to detect geometric and topological differences of digital images.
Abstract:This paper mathematically derives an analytic solution of the adversarial perturbation on a ReLU network, and theoretically explains the difficulty of adversarial training. Specifically, we formulate the dynamics of the adversarial perturbation generated by the multi-step attack, which shows that the adversarial perturbation tends to strengthen eigenvectors corresponding to a few top-ranked eigenvalues of the Hessian matrix of the loss w.r.t. the input. We also prove that adversarial training tends to strengthen the influence of unconfident input samples with large gradient norms in an exponential manner. Besides, we find that adversarial training strengthens the influence of the Hessian matrix of the loss w.r.t. network parameters, which makes the adversarial training more likely to oscillate along directions of a few samples, and boosts the difficulty of adversarial training. Crucially, our proofs provide a unified explanation for previous findings in understanding adversarial training.