Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Viktor Kocur

Efficient Vision-based Vehicle Speed Estimation

May 02, 2025

Andrej Macko, Lukáš Gajdošech, Viktor Kocur

Abstract:This paper presents a computationally efficient method for vehicle speed estimation from traffic camera footage. Building upon previous work that utilizes 3D bounding boxes derived from 2D detections and vanishing point geometry, we introduce several improvements to enhance real-time performance. We evaluate our method in several variants on the BrnoCompSpeed dataset in terms of vehicle detection and speed estimation accuracy. Our extensive evaluation across various hardware platforms, including edge devices, demonstrates significant gains in frames per second (FPS) compared to the prior state-of-the-art, while maintaining comparable or improved speed estimation accuracy. We analyze the trade-off between accuracy and computational cost, showing that smaller models utilizing post-training quantization offer the best balance for real-world deployment. Our best performing model beats previous state-of-the-art in terms of median vehicle speed estimation error (0.58 km/h vs. 0.60 km/h), detection precision (91.02% vs 87.08%) and recall (91.14% vs. 83.32%) while also being 5.5 times faster.

* Submitted to Journal of Real-Time Image Processing (JRTIP)

Via

Access Paper or Ask Questions

Are Minimal Radial Distortion Solvers Really Necessary for Relative Pose Estimation?

May 01, 2025

Viktor Kocur, Charalambos Tzamos, Yaqing Ding, Zuzana Berger Haladova, Torsten Sattler, Zuzana Kukelova

Abstract:Estimating the relative pose between two cameras is a fundamental step in many applications such as Structure-from-Motion. The common approach to relative pose estimation is to apply a minimal solver inside a RANSAC loop. Highly efficient solvers exist for pinhole cameras. Yet, (nearly) all cameras exhibit radial distortion. Not modeling radial distortion leads to (significantly) worse results. However, minimal radial distortion solvers are significantly more complex than pinhole solvers, both in terms of run-time and implementation efforts. This paper compares radial distortion solvers with two simple-to-implement approaches that do not use minimal radial distortion solvers: The first approach combines an efficient pinhole solver with sampled radial undistortion parameters, where the sampled parameters are used for undistortion prior to applying the pinhole solver. The second approach uses a state-of-the-art neural network to estimate the distortion parameters rather than sampling them from a set of potential values. Extensive experiments on multiple datasets, and different camera setups, show that complex minimal radial distortion solvers are not necessary in practice. We discuss under which conditions a simple sampling of radial undistortion parameters is preferable over calibrating cameras using a learning-based prior approach. Code and newly created benchmark for relative pose estimation under radial distortion are available at https://github.com/kocurvik/rdnet.

* arXiv admin note: substantial text overlap with arXiv:2410.05984

Via

Access Paper or Ask Questions

Three-view Focal Length Recovery From Homographies

Jan 13, 2025

Yaqing Ding, Viktor Kocur, Zuzana Berger Haladová, Qianliang Wu, Shen Cai, Jian Yang, Zuzana Kukelova

Figure 1 for Three-view Focal Length Recovery From Homographies

Figure 2 for Three-view Focal Length Recovery From Homographies

Figure 3 for Three-view Focal Length Recovery From Homographies

Figure 4 for Three-view Focal Length Recovery From Homographies

Abstract:In this paper, we propose a novel approach for recovering focal lengths from three-view homographies. By examining the consistency of normal vectors between two homographies, we derive new explicit constraints between the focal lengths and homographies using an elimination technique. We demonstrate that three-view homographies provide two additional constraints, enabling the recovery of one or two focal lengths. We discuss four possible cases, including three cameras having an unknown equal focal length, three cameras having two different unknown focal lengths, three cameras where one focal length is known, and the other two cameras have equal or different unknown focal lengths. All the problems can be converted into solving polynomials in one or two unknowns, which can be efficiently solved using Sturm sequence or hidden variable technique. Evaluation using both synthetic and real data shows that the proposed solvers are both faster and more accurate than methods relying on existing two-view solvers. The code and data are available on https://github.com/kocurvik/hf

* Code available at https://github.com/kocurvik/hf Dataset available at: https://doi.org/10.5281/zenodo.14638904

Via

Access Paper or Ask Questions

Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Jan 13, 2025

Yaqing Ding, Václav Vávra, Viktor Kocur, Jian Yang, Torsten Sattler, Zuzana Kukelova

Figure 1 for Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Figure 2 for Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Figure 3 for Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Figure 4 for Fixing the Scale and Shift in Monocular Depth For Camera Pose Estimation

Abstract:Recent advances in monocular depth prediction have led to significantly improved depth prediction accuracy. In turn, this enables various applications to use such depth predictions. In this paper, we propose a novel framework for estimating the relative pose between two cameras from point correspondences with associated monocular depths. Since depth predictions are typically defined up to an unknown scale and shift parameter, our solvers jointly estimate both scale and shift parameters together with the camera pose. We derive efficient solvers for three cases: (1) two calibrated cameras, (2) two uncalibrated cameras with an unknown but shared focal length, and (3) two uncalibrated cameras with unknown and different focal lengths. Experiments on synthetic and real data, including experiments with depth maps estimated by 11 different depth predictors, show the practical viability of our solvers. Compared to prior work, our solvers achieve state-of-the-art results on two large-scale, real-world datasets. The source code is available at https://github.com/yaqding/pose_monodepth

* 14 pages

Via

Access Paper or Ask Questions

On Representation of 3D Rotation in the Context of Deep Learning

Oct 15, 2024

Viktória Pravdová, Lukáš Gajdošech, Hassan Ali, Viktor Kocur

Figure 1 for On Representation of 3D Rotation in the Context of Deep Learning

Figure 2 for On Representation of 3D Rotation in the Context of Deep Learning

Figure 3 for On Representation of 3D Rotation in the Context of Deep Learning

Figure 4 for On Representation of 3D Rotation in the Context of Deep Learning

Abstract:This paper investigates various methods of representing 3D rotations and their impact on the learning process of deep neural networks. We evaluated the performance of ResNet18 networks for 3D rotation estimation using several rotation representations and loss functions on both synthetic and real data. The real datasets contained 3D scans of industrial bins, while the synthetic datasets included views of a simple asymmetric object rendered under different rotations. On synthetic data, we also assessed the effects of different rotation distributions within the training and test sets, as well as the impact of the object's texture. In line with previous research, we found that networks using the continuous 5D and 6D representations performed better than the discontinuous ones.

* Accepted at International Conference on Computer Vision and Graphics ICCVG 2024. The proceedings of the conference will be published in Lecture Notes in Networks and Systems (LNNS), Springer

Via

Access Paper or Ask Questions

Are Minimal Radial Distortion Solvers Necessary for Relative Pose Estimation?

Oct 08, 2024

Charalambos Tzamos, Viktor Kocur, Yaqing Ding, Torsten Sattler, Zuzana Kukelova

Abstract:Estimating the relative pose between two cameras is a fundamental step in many applications such as Structure-from-Motion. The common approach to relative pose estimation is to apply a minimal solver inside a RANSAC loop. Highly efficient solvers exist for pinhole cameras. Yet, (nearly) all cameras exhibit radial distortion. Not modeling radial distortion leads to (significantly) worse results. However, minimal radial distortion solvers are significantly more complex than pinhole solvers, both in terms of run-time and implementation efforts. This paper compares radial distortion solvers with a simple-to-implement approach that combines an efficient pinhole solver with sampled radial distortion parameters. Extensive experiments on multiple datasets and RANSAC variants show that this simple approach performs similarly or better than the most accurate minimal distortion solvers at faster run-times while being significantly more accurate than faster non-minimal solvers. We clearly show that complex radial distortion solvers are not necessary in practice. Code and benchmark are available at https://github.com/kocurvik/rd.

Via

Access Paper or Ask Questions

Enhancement of 3D Camera Synthetic Training Data with Noise Models

Feb 26, 2024

Katarína Osvaldová, Lukáš Gajdošech, Viktor Kocur, Martin Madaras

Abstract:The goal of this paper is to assess the impact of noise in 3D camera-captured data by modeling the noise of the imaging process and applying it on synthetic training data. We compiled a dataset of specifically constructed scenes to obtain a noise model. We specifically model lateral noise, affecting the position of captured points in the image plane, and axial noise, affecting the position along the axis perpendicular to the image plane. The estimated models can be used to emulate noise in synthetic training data. The added benefit of adding artificial noise is evaluated in an experiment with rendered data for object segmentation. We train a series of neural networks with varying levels of noise in the data and measure their ability to generalize on real data. The results show that using too little or too much noise can hurt the networks' performance indicating that obtaining a model of noise from real scanners is beneficial for synthetic data generation.

* Proceedings of the 27th Computer Vision Winter Workshop CVWW (2024) 29-37
* Published in 2024 Proceedings of the 27th Computer Vision Winter Workshop (CVWW). Accepted: 19.1.2024. Published: 16.2.2024. This work was funded by the Horizon-Widera-2021 European Twinning project TERAIS G.A. n. 101079338. Code: https://doi.org/10.5281/zenodo.10581562 Data: https://doi.org/10.5281/zenodo.10581278

Via

Access Paper or Ask Questions

Robust Self-calibration of Focal Lengths from the Fundamental Matrix

Nov 27, 2023

Viktor Kocur, Daniel Kyselica, Zuzana Kúkelová

Abstract:The problem of self-calibration of two cameras from a given fundamental matrix is one of the basic problems in geometric computer vision. Under the assumption of known principal points and square pixels, the well-known Bougnoux formula offers a means to compute the two unknown focal lengths. However, in many practical situations, the formula yields inaccurate results due to commonly occurring singularities. Moreover, the estimates are sensitive to noise in the computed fundamental matrix and to the assumed positions of the principal points. In this paper, we therefore propose an efficient and robust iterative method to estimate the focal lengths along with the principal points of the cameras given a fundamental matrix and priors for the estimated camera parameters. In addition, we study a computationally efficient check of models generated within RANSAC that improves the accuracy of the estimated models while reducing the total computational time. Extensive experiments on real and synthetic data show that our iterative method brings significant improvements in terms of the accuracy of the estimated focal lengths over the Bougnoux formula and other state-of-the-art methods, even when relying on inaccurate priors.

Via

Access Paper or Ask Questions

Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning

Nov 13, 2023

Tomáš Kunzo, Viktor Kocur, Lukáš Gajdošech, Martin Madaras

Abstract:Teeth segmentation is an essential task in dental image analysis for accurate diagnosis and treatment planning. While supervised deep learning methods can be utilized for teeth segmentation, they often require extensive manual annotation of segmentation masks, which is time-consuming and costly. In this research, we propose a weakly supervised approach for teeth segmentation that reduces the need for manual annotation. Our method utilizes the output heatmaps and intermediate feature maps from a keypoint detection network to guide the segmentation process. We introduce the TriDental dataset, consisting of 3000 oral cavity images annotated with teeth keypoints, to train a teeth keypoint detection network. We combine feature maps from different layers of the keypoint detection network, enabling accurate teeth segmentation without explicit segmentation annotations. The detected keypoints are also used for further refinement of the segmentation masks. Experimental results on the TriDental dataset demonstrate the superiority of our approach in terms of accuracy and robustness compared to state-of-the-art segmentation methods. Our method offers a cost-effective and efficient solution for teeth segmentation in real-world dental applications, eliminating the need for extensive manual annotation efforts.

* Pubslished in 2023 World Symposium on Digital Intelligence for Systems and Machines (DISA) Proceedings. Published version copyrighted by IEEE, pre-print released in accordance with the copyright agreement

Via

Access Paper or Ask Questions

Evaluating the Significance of Outdoor Advertising from Driver's Perspective Using Computer Vision

Nov 13, 2023

Zuzana Černeková, Zuzana Berger Haladová, Ján Špirka, Viktor Kocur

Abstract:Outdoor advertising, such as roadside billboards, plays a significant role in marketing campaigns but can also be a distraction for drivers, potentially leading to accidents. In this study, we propose a pipeline for evaluating the significance of roadside billboards in videos captured from a driver's perspective. We have collected and annotated a new BillboardLamac dataset, comprising eight videos captured by drivers driving through a predefined path wearing eye-tracking devices. The dataset includes annotations of billboards, including 154 unique IDs and 155 thousand bounding boxes, as well as eye fixation data. We evaluate various object tracking methods in combination with a YOLOv8 detector to identify billboard advertisements with the best approach achieving 38.5 HOTA on BillboardLamac. Additionally, we train a random forest classifier to classify billboards into three classes based on the length of driver fixations achieving 75.8% test accuracy. An analysis of the trained classifier reveals that the duration of billboard visibility, its saliency, and size are the most influential features when assessing billboard significance.

Via

Access Paper or Ask Questions