Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

A. Aydin Alatan

IG-SLAM: Instant Gaussian SLAM

Aug 07, 2024

F. Aykut Sarikamis, A. Aydin Alatan

Abstract:3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed training designs that consider the scale of the environment. To address these drawbacks, we present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting. A 3D map of the environment is constructed using accurate pose and dense depth provided by tracking. Additionally, we utilize depth uncertainty in map optimization to improve 3D reconstruction. Our decay strategy in map optimization enhances convergence and allows the system to run at 10 fps in a single process. We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds. We present our experiments on the Replica, TUM-RGBD, ScanNet, and EuRoC datasets. The system achieves photo-realistic 3D reconstruction in large-scale sequences, particularly in the EuRoC dataset.

* 8 pages, 3 page ref, 5 figures

Via

Access Paper or Ask Questions

Knowledge Distillation Layer that Lets the Student Decide

Sep 06, 2023

Ada Gorgun, Yeti Z. Gurbuz, A. Aydin Alatan

Abstract:Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework

* Accepted at the British Machine Vision Conference 2023 (BMVC 2023)

Via

Access Paper or Ask Questions

Generalizable Embeddings with Cross-batch Metric Learning

Jul 24, 2023

Yeti Z. Gurbuz, A. Aydin Alatan

Abstract:Global average pooling (GAP) is a popular component in deep metric learning (DML) for aggregating features. Its effectiveness is often attributed to treating each feature vector as a distinct semantic entity and GAP as a combination of them. Albeit substantiated, such an explanation's algorithmic implications to learn generalizable entities to represent unseen classes, a crucial DML goal, remain unclear. To address this, we formulate GAP as a convex combination of learnable prototypes. We then show that the prototype learning can be expressed as a recursive process fitting a linear predictor to a batch of samples. Building on that perspective, we consider two batches of disjoint classes at each iteration and regularize the learning by expressing the samples of a batch with the prototypes that are fitted to the other batch. We validate our approach on 4 popular DML benchmarks.

* \c{opyright} 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Via

Access Paper or Ask Questions

Feature Embedding by Template Matching as a ResNet Block

Oct 03, 2022

Ada Gorgun, Yeti Z. Gurbuz, A. Aydin Alatan

Figure 1 for Feature Embedding by Template Matching as a ResNet Block

Figure 2 for Feature Embedding by Template Matching as a ResNet Block

Figure 3 for Feature Embedding by Template Matching as a ResNet Block

Figure 4 for Feature Embedding by Template Matching as a ResNet Block

Abstract:Convolution blocks serve as local feature extractors and are the key to success of the neural networks. To make local semantic feature embedding rather explicit, we reformulate convolution blocks as feature selection according to the best matching kernel. In this manner, we show that typical ResNet blocks indeed perform local feature embedding via template matching once batch normalization (BN) followed by a rectified linear unit (ReLU) is interpreted as arg-max optimizer. Following this perspective, we tailor a residual block that explicitly forces semantically meaningful local feature embedding through using label information. Specifically, we assign a feature vector to each local region according to the classes that the corresponding region matches. We evaluate our method on three popular benchmark datasets with several architectures for image classification and consistently show that our approach substantially improves the performance of the baseline architectures.

Via

Access Paper or Ask Questions

Deep Metric Learning with Chance Constraints

Sep 19, 2022

Yeti Z. Gurbuz, Ogul Can, A. Aydin Alatan

Figure 1 for Deep Metric Learning with Chance Constraints

Figure 2 for Deep Metric Learning with Chance Constraints

Figure 3 for Deep Metric Learning with Chance Constraints

Figure 4 for Deep Metric Learning with Chance Constraints

Abstract:Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding image. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-based DML satisfies certain chance constraints, and that the worst case generalization performance of the proxy-based methods can be characterized by the radius of the smallest ball around a class proxy to cover the entire domain of the corresponding class samples, suggesting multiple proxies per class helps performance. To provide a scalable algorithm as well as exploiting more proxies, we consider the chance constraints implied by the minimizers of proxy-based DML instances and reformulate DML as finding a feasible point in intersection of such constraints, resulting in a problem to be approximately solved by iterative projections. Simply put, we repeatedly train a regularized proxy-based loss and re-initialize the proxies with the embeddings of the deliberately selected new samples. We apply our method with the well-accepted losses and evaluate on four popular benchmark datasets for image retrieval. Outperforming state-of-the-art, our method consistently improves the performance of the applied losses. Code is available at: https://github.com/yetigurbuz/ccp-dml

* Under review at IEEE Transactions on Neural Networks and Learning Systems

Via

Access Paper or Ask Questions

Improved Hard Example Mining Approach for Single Shot Object Detectors

Feb 26, 2022

Aybora Koksal, Onder Tuzcuoglu, Kutalmis Gokalp Ince, Yoldas Ataseven, A. Aydin Alatan

Figure 1 for Improved Hard Example Mining Approach for Single Shot Object Detectors

Figure 2 for Improved Hard Example Mining Approach for Single Shot Object Detectors

Figure 3 for Improved Hard Example Mining Approach for Single Shot Object Detectors

Figure 4 for Improved Hard Example Mining Approach for Single Shot Object Detectors

Abstract:Hard example mining methods generally improve the performance of the object detectors, which suffer from imbalanced training sets. In this work, two existing hard example mining approaches (LRM and focal loss, FL) are adapted and combined in a state-of-the-art real-time object detector, YOLOv5. The effectiveness of the proposed approach for improving the performance on hard examples is extensively evaluated. The proposed method increases mAP by 3% compared to using the original loss function and around 1-2% compared to using the hard-mining methods (LRM or FL) individually on 2021 Anti-UAV Challenge Dataset.

* 5 pages, 2 figures, 7 tables. The codes are available at https://github.com/aybora/yolov5Loss

Via

Access Paper or Ask Questions

Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Aug 28, 2021

Ufuk Efe, Kutalmis Gokalp Ince, A. Aydin Alatan

Figure 1 for Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Figure 2 for Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Figure 3 for Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Figure 4 for Effect of Parameter Optimization on Classical and Learning-based Image Matching Methods

Abstract:Deep learning-based image matching methods are improved significantly during the recent years. Although these methods are reported to outperform the classical techniques, the performance of the classical methods is not examined in detail. In this study, we compare classical and learning-based methods by employing mutual nearest neighbor search with ratio test and optimizing the ratio test threshold to achieve the best performance on two different performance metrics. After a fair comparison, the experimental results on HPatches dataset reveal that the performance gap between classical and learning-based methods is not that significant. Throughout the experiments, we demonstrated that SuperGlue is the state-of-the-art technique for the image matching problem on HPatches dataset. However, if a single parameter, namely ratio test threshold, is carefully optimized, a well-known traditional method SIFT performs quite close to SuperGlue and even outperforms in terms of mean matching accuracy (MMA) under 1 and 2 pixel thresholds. Moreover, a recent approach, DFM, which only uses pre-trained VGG features as descriptors and ratio test, is shown to outperform most of the well-trained learning-based methods. Therefore, we conclude that the parameters of any classical method should be analyzed carefully before comparing against a learning-based technique.

* 8 pages, 2 figures, 3 tables, ICCV 2021 TradiCV Workshop

Via

Access Paper or Ask Questions

DFM: A Performance Baseline for Deep Feature Matching

Jun 14, 2021

Ufuk Efe, Kutalmis Gokalp Ince, A. Aydin Alatan

Figure 1 for DFM: A Performance Baseline for Deep Feature Matching

Figure 2 for DFM: A Performance Baseline for Deep Feature Matching

Figure 3 for DFM: A Performance Baseline for Deep Feature Matching

Figure 4 for DFM: A Performance Baseline for Deep Feature Matching

Abstract:A novel image matching method is proposed that utilizes learned features extracted by an off-the-shelf deep neural network to obtain a promising performance. The proposed method uses pre-trained VGG architecture as a feature extractor and does not require any additional training specific to improve matching. Inspired by well-established concepts in the psychology area, such as the Mental Rotation paradigm, an initial warping is performed as a result of a preliminary geometric transformation estimate. These estimates are simply based on dense matching of nearest neighbors at the terminal layer of VGG network outputs of the images to be matched. After this initial alignment, the same approach is repeated again between reference and aligned images in a hierarchical manner to reach a good localization and matching performance. Our algorithm achieves 0.57 and 0.80 overall scores in terms of Mean Matching Accuracy (MMA) for 1 pixel and 2 pixels thresholds respectively on Hpatches dataset, which indicates a better performance than the state-of-the-art.

* CVPR 2021 Image Matching Workshop Camera Ready Version

Via

Access Paper or Ask Questions

Semi-Automatic Video Annotation For Object Detection

Jan 24, 2021

Kutalmis Gokalp Ince, Aybora Koksal, Arda Fazla, A. Aydin Alatan

Figure 1 for Semi-Automatic Video Annotation For Object Detection

Figure 2 for Semi-Automatic Video Annotation For Object Detection

Figure 3 for Semi-Automatic Video Annotation For Object Detection

Figure 4 for Semi-Automatic Video Annotation For Object Detection

Abstract:In this study, a semi-automatic video annotation method is proposed which utilizes temporal information to eliminate false-positives with a tracking-by-detection approach by employing multiple hypothesis tracking (MHT). MHT method automatically forms tracklets which are confirmed by human operators to enlarge the training set. A novel incremental learning approach helps to annotate videos in an iterative way. The experiments performed on AUTH Multidrone Dataset reveals that the annotation workload can be reduced up to 96% by the proposed approach.

* Submitted to ICIP 2021

Via

Access Paper or Ask Questions

Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Aug 19, 2020

M. Esat Kalfaoglu, Sinan Kalkan, A. Aydin Alatan

Figure 1 for Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Figure 2 for Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Figure 3 for Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Figure 4 for Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition

Abstract:In this work, we combine 3D convolution with late temporal modeling for action recognition. For this aim, we replace the conventional Temporal Global Average Pooling (TGAP) layer at the end of 3D convolutional architecture with the Bidirectional Encoder Representations from Transformers (BERT) layer in order to better utilize the temporal information with BERT's attention mechanism. We show that this replacement improves the performances of many popular 3D convolution architectures for action recognition, including ResNeXt, I3D, SlowFast and R(2+1)D. Moreover, we provide the-state-of-the-art results on both HMDB51 and UCF101 datasets with 85.10% and 98.69% top-1 accuracy, respectively. The code is publicly available.

Via

Access Paper or Ask Questions