Abstract:Visual abstract reasoning tasks present challenges for deep neural networks, exposing limitations in their capabilities. In this work, we present a neural network model that addresses the challenges posed by Raven's Progressive Matrices (RPM). Inspired by the two-stream hypothesis of visual processing, we introduce the Dual-stream Reasoning Network (DRNet), which utilizes two parallel branches to capture image features. On top of the two streams, a reasoning module first learns to merge the high-level features of the same image. Then, it employs a rule extractor to handle combinations involving the eight context images and each candidate image, extracting discrete abstract rules and utilizing an multilayer perceptron (MLP) to make predictions. Empirical results demonstrate that the proposed DRNet achieves state-of-the-art average performance across multiple RPM benchmarks. Furthermore, DRNet demonstrates robust generalization capabilities, even extending to various out-of-distribution scenarios. The dual streams within DRNet serve distinct functions by addressing local or spatial information. They are then integrated into the reasoning module, leveraging abstract rules to facilitate the execution of visual reasoning tasks. These findings indicate that the dual-stream architecture could play a crucial role in visual abstract reasoning.
Abstract:In this paper, we develop a new classification method for manifold-valued data in the framework of probabilistic learning vector quantization. In many classification scenarios, the data can be naturally represented by symmetric positive definite matrices, which are inherently points that live on a curved Riemannian manifold. Due to the non-Euclidean geometry of Riemannian manifolds, traditional Euclidean machine learning algorithms yield poor results on such data. In this paper, we generalize the probabilistic learning vector quantization algorithm for data points living on the manifold of symmetric positive definite matrices equipped with Riemannian natural metric (affine-invariant metric). By exploiting the induced Riemannian distance, we derive the probabilistic learning Riemannian space quantization algorithm, obtaining the learning rule through Riemannian gradient descent. Empirical investigations on synthetic data, image data , and motor imagery EEG data demonstrate the superior performance of the proposed method.
Abstract:We propose a neurobiologically inspired visual simultaneous localization and mapping (SLAM) system based on direction sparse method to real-time build cognitive maps of large-scale environments from a moving stereo camera. The core SLAM system mainly comprises a Bayesian attractor network, which utilizes neural responses of head direction (HD) cells in the hippocampus and grid cells in the medial entorhinal cortex (MEC) to represent the head direction and the position of the robot in the environment, respectively. Direct sparse method is employed to accurately and robustly estimate velocity information from a stereo camera. Input rotational and translational velocities are integrated by the HD cell and grid cell networks, respectively. We demonstrated our neurobiologically inspired stereo visual SLAM system on the KITTI odometry benchmark datasets. Our proposed SLAM system is robust to real-time build a coherent semi-metric topological map from a stereo camera. Qualitative evaluation on cognitive maps shows that our proposed neurobiologically inspired stereo visual SLAM system outperforms our previous brain-inspired algorithms and the neurobiologically inspired monocular visual SLAM system both in terms of tracking accuracy and robustness, which is closer to the traditional state-of-the-art one.
Abstract:The entorhinal-hippocampal circuit plays a critical role in higher brain functions, especially spatial cognition. Grid cells in the medial entorhinal cortex (MEC) periodically fire with different grid spacing and orientation, which makes a contribution that place cells in the hippocampus can uniquely encode locations in an environment. But how sparse firing granule cells in the dentate gyrus are formed from grid cells in the MEC remains to be determined. Recently, the fruit fly olfactory circuit provides a variant algorithm (called locality-sensitive hashing) to solve this problem. To investigate how the sparse place firing generates in the dentate gyrus can help animals to break the perception ambiguity during environment exploration, we build a biologically relevant, computational model from grid cells to place cells. The weight from grid cells to dentate gyrus granule cells is learned by competitive Hebbian learning. We resorted to the robot system for demonstrating our cognitive mapping model on the KITTI odometry benchmark dataset. The experimental results show that our model is able to stably, robustly build a coherent semi-metric topological map in the large-scale outdoor environment. The experimental results suggest that the entorhinal-hippocampal circuit as a variant locality-sensitive hashing algorithm is capable of generating sparse encoding for easily distinguishing different locations in the environment. Our experiments also provide theoretical supports that this analogous hashing algorithm may be a general principle of computation in different brain regions and species.
Abstract:As the robot explores the environment, the map grows over time in the simultaneous localization and mapping (SLAM) system, especially for the large scale environment. The ever-growing map prevents long-term mapping. In this paper, we developed a compact cognitive mapping approach inspired by neurobiological experiments. Inspired from neighborhood cells, neighborhood fields determined by movement information, i.e. translation and rotation, are proposed to describe one of distinct segments of the explored environment. The vertices and edges with movement information below the threshold of the neighborhood fields are avoided adding to the cognitive map. The optimization of the cognitive map is formulated as a robust non-linear least squares problem, which can be efficiently solved by the fast open linear solvers as a general problem. According to the cognitive decision-making of familiar environments, loop closure edges are clustered depending on time intervals, and then parallel computing is applied to perform batch global optimization of the cognitive map for ensuring the efficiency of computation and real-time performance. After the loop closure process, scene integration is performed, in which revisited vertices are removed subsequently to further reduce the size of the cognitive map. A monocular visual SLAM system is developed to test our approach in a rat-like maze environment. Our results suggest that the method largely restricts the growth of the size of the cognitive map over time, and meanwhile, the compact cognitive map correctly represents the overall layer of the environment as the standard one. Experiments demonstrate that our method is very suited for compact cognitive mapping to support long-term robot mapping. Our approach is simple, but pragmatic and efficient for achieving the compact cognitive map.