Abstract:The research interest on location-based services has increased during the last years ever since 3D centimetre accuracy inside intelligent environments could be confronted with. This work proposes an indoor local positioning system based on LED lighting, transmitted from a set of beacons to a receiver.The receiver is based on a quadrant photodiode angular diversity aperture (QADA) plus an aperture placed over it.This configuration can be modelled as a perspective camera, where the image position of the transmitters can be used to recover the receiver's 3D pose. This process is known as the perspective-n-point (PnP) problem, which is well known in computer vision and photogrammetry. This work investigates the use of different state-of-the-art PnP algorithms to localize the receiver in a large space based on four co-planar transmitters and with a distance from transmitters to receiver of 3.4 m. Encoding techniques are used to permit the simultaneous emission of all the transmitted signals and their processing in the receiver. In addition, correlation techniques are used to determine the image points projected from each emitter on the QADA. This work uses Monte Carlo simulations to characterize the absolute errors for a grid of test points under noisy measurements, as well as the robustness of the system when varying the 3D location of one transmitter. The IPPE algorithm obtained the best performance in this configuration. The proposal has also been experimentally evaluated in a real setup. The estimation of the receiver's position at three points using the IPPE algorithm achieves average absolute errors of 4.33cm, 3.51cm and 28.90cm in the coordinates x, y and z, respectively. These positioning results are in line with those obtained in previous work using triangulation techniques but with the addition that the complete pose of the receiver is obtained in this proposal.
Abstract:Non-Rigid Structure-from-Motion (NRSfM) reconstructs a deformable 3D object from the correspondences established between monocular 2D images. Current NRSfM methods lack statistical robustness, which is the ability to cope with correspondence errors.This prevents one to use automatically established correspondences, which are prone to errors, thereby strongly limiting the scope of NRSfM. We propose a three-step automatic pipeline to solve NRSfM robustly by exploiting isometry. Step 1 computes the optical flow from correspondences, step 2 reconstructs each 3D point's normal vector using multiple reference images and integrates them to form surfaces with the best reference and step 3 rejects the 3D points that break isometry in their local neighborhood. Importantly, each step is designed to discard or flag erroneous correspondences. Our contributions include the robustification of optical flow by warp estimation, new fast analytic solutions to local normal reconstruction and their robustification, and a new scale-independent measure of 3D local isometric coherence. Experimental results show that our robust NRSfM method consistently outperforms existing methods on both synthetic and real datasets.
Abstract:This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at the person's head. The likelihood map encodes both the number of detected people and their 2D image positions, and can be used to recover the 3D position of each person using the depth image and the camera calibration parameters. Our architecture is compact, using separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use simulated data for initially training the network, followed by fine tuning with a relatively small amount of real data. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training. We thoroughly compare our method against the existing state-of-the-art, including both classical and DNN-based solutions. Our method outperforms existing methods and can accurately detect people in scenes with significant occlusions.
Abstract:In this paper we propose a method based on deep learning that detects multiple people from a single overhead depth image with high reliability. Our neural network, called DPDnet, is based on two fully-convolutional encoder-decoder neural blocks based on residual layers. The Main Block takes a depth image as input and generates a pixel-wise confidence map, where each detected person in the image is represented by a Gaussian-like distribution. The refinement block combines the depth image and the output from the main block, to refine the confidence map. Both blocks are simultaneously trained end-to-end using depth images and head position labels. The experimental work shows that DPDNet outperforms state-of-the-art methods, with accuracies greater than 99% in three different publicly available datasets, without retraining not fine-tuning. In addition, the computational complexity of our proposal is independent of the number of people in the scene and runs in real time using conventional GPUs.
Abstract:We present Deep Shape-from-Template (DeepSfT), a novel Deep Neural Network (DNN) method for solving real-time automatic registration and 3D reconstruction of a deformable object viewed in a single monocular image.DeepSfT advances the state-of-the-art in various aspects. Compared to existing DNN SfT methods, it is the first fully convolutional real-time approach that handles an arbitrary object geometry, topology and surface representation. It also does not require ground truth registration with real data and scales well to very complex object models with large numbers of elements. Compared to previous non-DNN SfT methods, it does not involve numerical optimization at run-time, and is a dense, wide-baseline solution that does not demand, and does not suffer from, feature-based matching. It is able to process a single image with significant deformation and viewpoint changes, and handles well the core challenges of occlusions, weak texture and blur. DeepSfT is based on residual encoder-decoder structures and refining blocks. It is trained end-to-end with a novel combination of supervised learning from simulated renderings of the object model and semi-supervised automatic fine-tuning using real data captured with a standard RGB-D camera. The cameras used for fine-tuning and run-time can be different, making DeepSfT practical for real-world use. We show that DeepSfT significantly outperforms state-of-the-art wide-baseline approaches for non-trivial templates, with quantitative and qualitative evaluation.
Abstract:This article is a study about the existence and the uniqueness of solutions of a specific quadratic first-order ODE that frequently appears in multiple reconstruction problems. It is called the \emph{planar-perspective equation} due to the duality with the geometric problem of reconstruction of planar-perspective curves from their modulus. Solutions of the \emph{planar-perspective equation} are related with planar curves parametrized with perspective parametrization due to this geometric interpretation. The article proves the existence of only two local solutions to the \emph{initial value problem} with \emph{regular initial conditions} and a maximum of two analytic solutions with \emph{critical initial conditions}. The article also gives theorems to extend the local definition domain where the existence of both solutions are guaranteed. It introduces the \emph{maximal depth function} as a function that upper-bound all possible solutions of the \emph{planar-perspective equation} and contains all its possible \emph{critical points}. Finally, the article describes the \emph{maximal-depth solution problem} that consists of finding the solution of the referred equation that has maximum the depth and proves its uniqueness. It is an important problem as it does not need initial conditions to obtain the unique solution and its the frequent solution that practical algorithms of the state-of-the-art give.