Abstract:Performing delicate Minimally Invasive Surgeries (MIS) forces surgeons to accurately assess the position and orientation (pose) of surgical instruments. In current practice, this pose information is provided by conventional tracking systems (optical and electro-magnetic). Two challenges render these systems inadequate for minimally invasive bone surgery: the need for instrument positioning with high precision and occluding tissue blocking the line of sight. Fluoroscopic tracking is limited by the radiation exposure to patient and surgeon. A possible solution is constraining the acquisition of x-ray images. The distinct acquisitions at irregular intervals require a pose estimation solution instead of a tracking technique. We develop i3PosNet (Iterative Image Instrument Pose estimation Network), a patch-based modular Deep Learning method enhanced by geometric considerations, which estimates the pose of surgical instruments from single x-rays. For the evaluation of i3PosNet, we consider the scenario of drilling in the otobasis. i3PosNet generalizes well to different instruments, which we show by applying it to a screw, a drill and a robot. i3PosNet consistently estimates the pose of surgical instruments better than conventional image registration techniques by a factor of 5 and more achieving in-plane position errors of 0.031 mm +- 0.025 mm and angle errors of 0.031 +- 1.126. Additional factors, such as depth are evaluated to 0.361 mm +- 8.98 mm from single radiographs.