Abstract:For autonomous robotics applications, it is crucial that robots are able to accurately measure their potential state and perceive their environment, including other agents within it (e.g., cobots interacting with humans). The redundancy of these measurements is important, as it allows for planning and execution of recovery protocols in the event of sensor failure or external disturbances. Visual estimation can provide this redundancy through the use of low-cost sensors and server as a standalone source of proprioception when no encoder-based sensing is available. Therefore, we estimate the configuration of the robot jointly with its pose, which provides a complete spatial understanding of the observed robot. We present GISR - a method for deep configuration and robot-to-camera pose estimation that prioritizes real-time execution. GISR is comprised of two modules: (i) a geometric initialization module, efficiently computing an approximate robot pose and configuration, and (ii) an iterative silhouette-based refinement module that refines the initial solution in only a few iterations. We evaluate our method on a publicly available dataset and show that GISR performs competitively with existing state-of-the-art approaches, while being significantly faster compared to existing methods of the same class. Our code is available at https://github.com/iwhitey/GISR-robot.