Abstract:The recently proposed open-source KAZE image feature detection and description algorithm offers unprecedented performance in comparison to conventional ones like SIFT and SURF as it relies on nonlinear scale spaces instead of Gaussian linear scale spaces. The improved performance, however, comes with a significant computational cost limiting its use for many applications. We report a GPGPU implementation of the KAZE algorithm without resorting to binary descriptors for gaining speedup. For a 1920 by 1200 sized image our Compute Unified Device Architecture (CUDA) C based GPU version took around 300 milliseconds on a NVIDIA GeForce GTX Titan X (Maxwell Architecture-GM200) card in comparison to nearly 2400 milliseconds for a multithreaded CPU version (16 threaded Intel(R) Xeon(R) CPU E5-2650 processsor). The CUDA based parallel implementation is described in detail with fine-grained comparison between the GPU and CPU implementations. By achieving nearly 8 fold speedup without performance degradation our work expands the applicability of the KAZE algorithm. Additionally, the strategies described here can prove useful for the GPU implementation of other nonlinear scale space based methods.