Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ivan Petrović

An Active Inference Model of Covert and Overt Visual Attention

May 06, 2025

Tin Mišić, Karlo Koledić, Fabio Bonsignorio, Ivan Petrović, Ivan Marković

Abstract:The ability to selectively attend to relevant stimuli while filtering out distractions is essential for agents that process complex, high-dimensional sensory input. This paper introduces a model of covert and overt visual attention through the framework of active inference, utilizing dynamic optimization of sensory precisions to minimize free-energy. The model determines visual sensory precisions based on both current environmental beliefs and sensory input, influencing attentional allocation in both covert and overt modalities. To test the effectiveness of the model, we analyze its behavior in the Posner cueing task and a simple target focus task using two-dimensional(2D) visual data. Reaction times are measured to investigate the interplay between exogenous and endogenous attention, as well as valid and invalid cueing. The results show that exogenous and valid cues generally lead to faster reaction times compared to endogenous and invalid cues. Furthermore, the model exhibits behavior similar to inhibition of return, where previously attended locations become suppressed after a specific cue-target onset asynchrony interval. Lastly, we investigate different aspects of overt attention and show that involuntary, reflexive saccades occur faster than intentional ones, but at the expense of adaptability.

* 7 pages, 7 figures. Code available at https://github.com/unizgfer-lamor/ainf-visual-attention

Via

Access Paper or Ask Questions

Impact of Temporal Delay on Radar-Inertial Odometry

Mar 04, 2025

Vlaho-Josip Štironja, Luka Petrović, Juraj Peršić, Ivan Marković, Ivan Petrović

Abstract:Accurate ego-motion estimation is a critical component of any autonomous system. Conventional ego-motion sensors, such as cameras and LiDARs, may be compromised in adverse environmental conditions, such as fog, heavy rain, or dust. Automotive radars, known for their robustness to such conditions, present themselves as complementary sensors or a promising alternative within the ego-motion estimation frameworks. In this paper we propose a novel Radar-Inertial Odometry (RIO) system that integrates an automotive radar and an inertial measurement unit. The key contribution is the integration of online temporal delay calibration within the factor graph optimization framework that compensates for potential time offsets between radar and IMU measurements. To validate the proposed approach we have conducted thorough experimental analysis on real-world radar and IMU data. The results show that, even without scan matching or target tracking, integration of online temporal calibration significantly reduces localization error compared to systems that disregard time synchronization, thus highlighting the important role of, often neglected, accurate temporal alignment in radar-based sensor fusion systems for autonomous navigation.

Via

Access Paper or Ask Questions

RAVE: A Framework for Radar Ego-Velocity Estimation

Jun 27, 2024

Vlaho-Josip Štironja, Luka Petrović, Juraj Peršić, Ivan Marković, Ivan Petrović

Figure 1 for RAVE: A Framework for Radar Ego-Velocity Estimation

Figure 2 for RAVE: A Framework for Radar Ego-Velocity Estimation

Figure 3 for RAVE: A Framework for Radar Ego-Velocity Estimation

Figure 4 for RAVE: A Framework for Radar Ego-Velocity Estimation

Abstract:State estimation is an essential component of autonomous systems, usually relying on sensor fusion that integrates data from cameras, LiDARs and IMUs. Recently, radars have shown the potential to improve the accuracy and robustness of state estimation and perception, especially in challenging environmental conditions such as adverse weather and low-light scenarios. In this paper, we present a framework for ego-velocity estimation, which we call RAVE, that relies on 3D automotive radar data and encompasses zero velocity detection, outlier rejection, and velocity estimation. In addition, we propose a simple filtering method to discard infeasible ego-velocity estimates. We also conduct a systematic analysis of how different existing outlier rejection techniques and optimization loss functions impact estimation accuracy. Our evaluation on three open-source datasets demonstrates the effectiveness of the proposed filter and a significant positive impact of RAVE on the odometry accuracy. Furthermore, we release an open-source implementation of the proposed framework for radar ego-velocity estimation accompanied with a ROS interface.

Via

Access Paper or Ask Questions

GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

May 08, 2024

Ivan Bilić, Filip Marić, Fabio Bonsignorio, Ivan Petrović

Figure 1 for GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

Figure 2 for GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

Figure 3 for GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

Figure 4 for GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation

Abstract:For autonomous robotics applications, it is crucial that robots are able to accurately measure their potential state and perceive their environment, including other agents within it (e.g., cobots interacting with humans). The redundancy of these measurements is important, as it allows for planning and execution of recovery protocols in the event of sensor failure or external disturbances. Visual estimation can provide this redundancy through the use of low-cost sensors and server as a standalone source of proprioception when no encoder-based sensing is available. Therefore, we estimate the configuration of the robot jointly with its pose, which provides a complete spatial understanding of the observed robot. We present GISR - a method for deep configuration and robot-to-camera pose estimation that prioritizes real-time execution. GISR is comprised of two modules: (i) a geometric initialization module, efficiently computing an approximate robot pose and configuration, and (ii) an iterative silhouette-based refinement module that refines the initial solution in only a few iterations. We evaluate our method on a publicly available dataset and show that GISR performs competitively with existing state-of-the-art approaches, while being significantly faster compared to existing methods of the same class. Our code is available at https://github.com/iwhitey/GISR-robot.

* Submitted to IEEE Robotics and Automation Letters (RA-L)

Via

Access Paper or Ask Questions

GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera Parameters via Ground Plane Embedding

Dec 10, 2023

Karlo Koledić, Luka Petrović, Ivan Petrović, Ivan Marković

Abstract:Learning-based monocular depth estimation leverages geometric priors present in the training data to enable metric depth perception from a single image, a traditionally ill-posed problem. However, these priors are often specific to a particular domain, leading to limited generalization performance on unseen data. Apart from the well studied environmental domain gap, monocular depth estimation is also sensitive to the domain gap induced by varying camera parameters, an aspect that is often overlooked in current state-of-the-art approaches. This issue is particularly evident in autonomous driving scenarios, where datasets are typically collected with a single vehicle-camera setup, leading to a bias in the training data due to a fixed perspective geometry. In this paper, we challenge this trend and introduce GenDepth, a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups. To address the lack of data with sufficiently diverse camera parameters, we first create a bespoke synthetic dataset collected with different vehicle-camera systems. Then, we design GenDepth to simultaneously optimize two objectives: (i) equivariance to the camera parameter variations on synthetic data, (ii) transferring the learned equivariance to real-world environmental features using a single real-world dataset with a fixed vehicle-camera system. To achieve this, we propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment. We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.

Via

Access Paper or Ask Questions

Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

Jul 04, 2023

Oliver Limoyo, Filip Marić, Matthew Giamou, Petra Alexson, Ivan Petrović, Jonathan Kelly

Figure 1 for Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

Figure 2 for Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

Figure 3 for Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

Figure 4 for Euclidean Equivariant Models for Generative Graphical Inverse Kinematics

Abstract:Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers typically produce a single solution only and rely on local search techniques to minimize a highly nonconvex objective function. Recently, learning-based approaches that approximate the entire feasible set of solutions have shown promise as a means to generate multiple fast and accurate IK results in parallel. However, existing learning-based techniques have a significant drawback: each robot of interest requires a specialized model that must be trained from scratch. To address this shortcoming, we investigate a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train a generative graphical inverse kinematics solver (GGIK) that is able to produce a large number of diverse solutions in parallel while also generalizing well -- a single learned model can be used to produce IK solutions for a variety of different robots. The graphical formulation elegantly exposes the symmetry and Euclidean equivariance of the IK problem that stems from the spatial nature of robot manipulators. We exploit this symmetry by encoding it into the architecture of our learned model, yielding a flexible solver that is able to produce sets of IK solutions for multiple robots.

* Proceedings of the Robotics: Science and Systems (RSS'23) Workshop on Symmetries in Robot Learning, Daegu, Republic of Korea, Jul. 10, 2023. arXiv admin note: substantial text overlap with arXiv:2209.08812

Via

Access Paper or Ask Questions

A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image

Jan 05, 2023

Ivan Bilić, Filip Marić, Ivan Marković, Ivan Petrović

Figure 1 for A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image

Figure 2 for A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image

Figure 3 for A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image

Figure 4 for A Distance-Geometric Method for Recovering Robot Joint Angles From an RGB Image

Abstract:Autonomous manipulation systems operating in domains where human intervention is difficult or impossible (e.g., underwater, extraterrestrial or hazardous environments) require a high degree of robustness to sensing and communication failures. Crucially, motion planning and control algorithms require a stream of accurate joint angle data provided by joint encoders, the failure of which may result in an unrecoverable loss of functionality. In this paper, we present a novel method for retrieving the joint angles of a robot manipulator using only a single RGB image of its current configuration, opening up an avenue for recovering system functionality when conventional proprioceptive sensing is unavailable. Our approach, based on a distance-geometric representation of the configuration space, exploits the knowledge of a robot's kinematic model with the goal of training a shallow neural network that performs a 2D-to-3D regression of distances associated with detected structural keypoints. It is shown that the resulting Euclidean distance matrix uniquely corresponds to the observed configuration, where joint angles can be recovered via multidimensional scaling and a simple inverse kinematics procedure. We evaluate the performance of our approach on real RGB images of a Franka Emika Panda manipulator, showing that the proposed method is efficient and exhibits solid generalization ability. Furthermore, we show that our method can be easily combined with a dense refinement technique to obtain superior results.

* Submitted to IFAC

Via

Access Paper or Ask Questions

One Network, Many Robots: Generative Graphical Inverse Kinematics

Sep 22, 2022

Oliver Limoyo, Filip Marić, Matthew Giamou, Petra Alexson, Ivan Petrović, Jonathan Kelly

Figure 1 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 2 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 3 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Figure 4 for One Network, Many Robots: Generative Graphical Inverse Kinematics

Abstract:Quickly and reliably finding accurate inverse kinematics (IK) solutions remains a challenging problem for robotic manipulation. Existing numerical solvers are broadly applicable, but rely on local search techniques to manage highly nonconvex objective functions. Recently, learning-based approaches have shown promise as a means to generate fast and accurate IK results; learned solvers can easily be integrated with other learning algorithms in end-to-end systems. However, learning-based methods have an Achilles' heel: each robot of interest requires a specialized model which must be trained from scratch. To address this key shortcoming, we investigate a novel distance-geometric robot representation coupled with a graph structure that allows us to leverage the flexibility of graph neural networks (GNNs). We use this approach to train the first learned generative graphical inverse kinematics (GGIK) solver that is, crucially, "robot-agnostic"-a single model is able to provide IK solutions for a variety of different robots. Additionally, the generative nature of GGIK allows the solver to produce a large number of diverse solutions in parallel with minimal additional computation time, making it appropriate for applications such as sampling-based motion planning. Finally, GGIK can complement local IK solvers by providing reliable initializations. These advantages, as well as the ability to use task-relevant priors and to continuously improve with new data, suggest that GGIK has the potential to be a key component of flexible, learning-based robotic manipulation systems.

Via

Access Paper or Ask Questions

Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy

Sep 08, 2021

Igor Cvišić, Ivan Marković, Ivan Petrović

Figure 1 for Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy

Figure 2 for Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy

Figure 3 for Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy

Figure 4 for Recalibrating the KITTI Dataset Camera Setup for Improved Odometry Accuracy

Abstract:Over the last decade, one of the most relevant public datasets for evaluating odometry accuracy is the KITTI dataset. Beside the quality and rich sensor setup, its success is also due to the online evaluation tool, which enables researchers to benchmark and compare algorithms. The results are evaluated on the test subset solely, without any knowledge about the ground truth, yielding unbiased, overfit free and therefore relevant validation for robot localization based on cameras, 3D laser or combination of both. However, as any sensor setup, it requires prior calibration and rectified stereo images are provided, introducing dependence on the default calibration parameters. Given that, a natural question arises if a better set of calibration parameters can be found that would yield higher odometry accuracy. In this paper, we propose a new approach for one shot calibration of the KITTI dataset multiple camera setup. The approach yields better calibration parameters, both in the sense of lower calibration reprojection errors and lower visual odometry error. We conducted experiments where we show for three different odometry algorithms, namely SOFT2, ORB-SLAM2 and VISO2, that odometry accuracy is significantly improved with the proposed calibration parameters. Moreover, our odometry, SOFT2, in conjunction with the proposed calibration method achieved the highest accuracy on the official KITTI scoreboard with 0.53% translational and 0.0009 deg/m rotational error, outperforming even 3D laser-based methods.

* European Conference on Mobile Robots (ECMR) 2021

Via

Access Paper or Ask Questions

Convex Iteration for Distance-Geometric Inverse Kinematics

Sep 08, 2021

Matthew Giamou, Filip Marić, David M. Rosen, Valentin Peretroukhin, Nicholas Roy, Ivan Petrović, Jonathan Kelly

Figure 1 for Convex Iteration for Distance-Geometric Inverse Kinematics

Figure 2 for Convex Iteration for Distance-Geometric Inverse Kinematics

Figure 3 for Convex Iteration for Distance-Geometric Inverse Kinematics

Figure 4 for Convex Iteration for Distance-Geometric Inverse Kinematics

Abstract:Inverse kinematics (IK) is the problem of finding robot joint configurations that satisfy constraints on the position or pose of one or more end-effectors. For robots with redundant degrees of freedom, there is often an infinite, nonconvex set of solutions. The IK problem is further complicated when collision avoidance constraints are imposed by obstacles in the workspace. In general, closed-form expressions yielding feasible configurations do not exist, motivating the use of numerical solution methods. However, these approaches rely on local optimization of nonconvex problems, often requiring an accurate initialization or numerous re-initializations to converge to a valid solution. In this work, we first formulate complicated inverse kinematics problems as convex feasibility problems whose low-rank feasible points provide exact IK solutions. We then present CIDGIK (Convex Iteration for Distance-Geometric Inverse Kinematics), an algorithm that solves these feasibility problems with a sequence of semidefinite programs whose objectives are designed to encourage low-rank minimizers. Our problem formulation elegantly unifies the configuration space and workspace constraints of a robot: intrinsic robot geometry and obstacle avoidance are both expressed as simple linear matrix equations and inequalities. Our experimental results for a variety of popular manipulator models demonstrate faster and more accurate convergence than a conventional nonlinear optimization-based approach, especially in environments with many obstacles.

* Submitted to IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions