Abstract:Hand-eye calibration, as a fundamental task in vision-based robotic systems, aims to estimate the transformation matrix between the coordinate frame of the camera and the robot flange. Most approaches to hand-eye calibration rely on external markers or human assistance. We proposed Look at Robot Base Once (LRBO), a novel methodology that addresses the hand-eye calibration problem without external calibration objects or human support, but with the robot base. Using point clouds of the robot base, a transformation matrix from the coordinate frame of the camera to the robot base is established as I=AXB. To this end, we exploit learning-based 3D detection and registration algorithms to estimate the location and orientation of the robot base. The robustness and accuracy of the method are quantified by ground-truth-based evaluation, and the accuracy result is compared with other 3D vision-based calibration methods. To assess the feasibility of our methodology, we carried out experiments utilizing a low-cost structured light scanner across varying joint configurations and groups of experiments. The proposed hand-eye calibration method achieved a translation deviation of 0.930 mm and a rotation deviation of 0.265 degrees according to the experimental results. Additionally, the 3D reconstruction experiments demonstrated a rotation error of 0.994 degrees and a position error of 1.697 mm. Moreover, our method offers the potential to be completed in 1 second, which is the fastest compared to other 3D hand-eye calibration methods. Code is released at github.com/leihui6/LRBO.
Abstract:Composed image retrieval is a type of image retrieval task where the user provides a reference image as a starting point and specifies a text on how to shift from the starting point to the desired target image. However, most existing methods focus on the composition learning of text and reference images and oversimplify the text as a description, neglecting the inherent structure and the user's shifting intention of the texts. As a result, these methods typically take shortcuts that disregard the visual cue of the reference images. To address this issue, we reconsider the text as instructions and propose a Semantic Shift network (SSN) that explicitly decomposes the semantic shifts into two steps: from the reference image to the visual prototype and from the visual prototype to the target image. Specifically, SSN explicitly decomposes the instructions into two components: degradation and upgradation, where the degradation is used to picture the visual prototype from the reference image, while the upgradation is used to enrich the visual prototype into the final representations to retrieve the desired target image. The experimental results show that the proposed SSN demonstrates a significant improvement of 5.42% and 1.37% on the CIRR and FashionIQ datasets, respectively, and establishes a new state-of-the-art performance. Codes will be publicly available.
Abstract:Spiking neural networks (SNNs) receive widespread attention because of their low-power hardware characteristic and brain-like signal response mechanism, but currently, the performance of SNNs is still behind Artificial Neural Networks (ANNs). We build an information theory-inspired system called Stochastic Probability Adjustment (SPA) system to reduce this gap. The SPA maps the synapses and neurons of SNNs into a probability space where a neuron and all connected pre-synapses are represented by a cluster. The movement of synaptic transmitter between different clusters is modeled as a Brownian-like stochastic process in which the transmitter distribution is adaptive at different firing phases. We experimented with a wide range of existing unsupervised SNN architectures and achieved consistent performance improvements. The improvements in classification accuracy have reached 1.99% and 6.29% on the MNIST and EMNIST datasets respectively.
Abstract:Spiking Neural Network (SNN), as a brain-inspired approach, is attracting attentions due to its potential to produce ultra-high-energy-efficient hardware. Competitive learning based on Spike-Timing-Dependent Plasticity (STDP) is a popular method to train unsupervised SNN. However, previous unsupervised SNNs trained through this method are limited to shallow networks with only one learnable layer and can't achieve satisfactory results when compared with multi-layer SNNs. In this paper, we ease this limitation by: 1)We propose Spiking Inception (Sp-Inception) module, inspired by the Inception module in Artificial Neural Network (ANN) literature. This module is trained through STDP- based competitive learning and outperforms baseline modules on learning capability, learning efficiency, and robustness; 2)We propose Pooling-Reshape-Activate (PRA) layer to make Sp-Inception module stackable; 3)We stack multiple Sp-Inception modules to construct multi-layer SNNs. Our method greatly exceeds baseline methods on image classification tasks and reaches state-of-the-art results on MNIST dataset among existing unsupervised SNNs.
Abstract:Spiking Neural Network (SNN), as a brain-inspired machine learning algorithm, is closer to the computing mechanism of human brain and more suitable to reveal the essence of intelligence compared with Artificial Neural Networks (ANN), attracting more and more attention in recent years. In addition, the information processed by SNN is in the form of discrete spikes, which makes SNN have low power consumption characteristics. In this paper, we propose an efficient and strong unsupervised SNN named BioSNet with high biological plausibility to handle image classification tasks. In BioSNet, we propose a new biomimetic spiking neuron model named MRON inspired by 'recognition memory' in the human brain, design an efficient and robust network architecture corresponding to biological characteristics of the human brain as well, and extend the traditional voting mechanism to the Vote-for-All (VFA) decoding layer so as to reduce information loss during decoding. Simulation results show that BioSNet not only achieves state-of-the-art unsupervised classification accuracy on MNIST/EMNIST data sets, but also exhibits superior learning efficiency and high robustness. Specifically, the BioSNet trained with only dozens of samples per class can achieve a favorable classification accuracy over 80% and randomly deleting even 95% of synapses or neurons in the BioSNet only leads to slight performance degradation.