Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Adamos Solomou

On the Second-order Convergence Properties of Random Search Methods

Oct 25, 2021

Aurelien Lucchi, Antonio Orvieto, Adamos Solomou

Figure 1 for On the Second-order Convergence Properties of Random Search Methods

Figure 2 for On the Second-order Convergence Properties of Random Search Methods

Figure 3 for On the Second-order Convergence Properties of Random Search Methods

Figure 4 for On the Second-order Convergence Properties of Random Search Methods

Abstract:We study the theoretical convergence properties of random-search methods when optimizing non-convex objective functions without having access to derivatives. We prove that standard random-search methods that do not rely on second-order information converge to a second-order stationary point. However, they suffer from an exponential complexity in terms of the input dimension of the problem. In order to address this issue, we propose a novel variant of random search that exploits negative curvature by only relying on function evaluations. We prove that this approach converges to a second-order stationary point at a much faster rate than vanilla methods: namely, the complexity in terms of the number of function evaluations is only linear in the problem dimension. We test our algorithm empirically and find good agreements with our theoretical results.

* NeurIPS 2021

Via

Access Paper or Ask Questions

On Learning the Transformer Kernel

Oct 15, 2021

Sankalan Pal Chowdhury, Adamos Solomou, Avinava Dubey, Mrinmaya Sachan

Figure 1 for On Learning the Transformer Kernel

Figure 2 for On Learning the Transformer Kernel

Figure 3 for On Learning the Transformer Kernel

Figure 4 for On Learning the Transformer Kernel

Abstract:In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.

* 26 pages, of which 11 form the appendix. 6 figures of which 2 are part of appendix

Via

Access Paper or Ask Questions