Topic:Line Segment Detection
What is Line Segment Detection? Line segment detection is the process of identifying straight lines in an image or video.
Papers and Code
Dec 09, 2024
Abstract:Echocardiography is the most widely used imaging to monitor cardiac functions, serving as the first line in early detection of myocardial ischemia and infarction. However, echocardiography often suffers from several artifacts including sensor noise, lack of contrast, severe saturation, and missing myocardial segments which severely limit its usage in clinical diagnosis. In recent years, several machine learning methods have been proposed to improve echocardiography views. Yet, these methods usually address only a specific problem (e.g. denoising) and thus cannot provide a robust and reliable restoration in general. On the other hand, cardiac MRI provides a clean view of the heart without suffering such severe issues. However, due to its significantly higher cost, it is often only afforded by a few major hospitals, hence hindering its use and accessibility. In this pilot study, we propose a novel approach to transform echocardiography into the cardiac MRI view. For this purpose, Echo2MRI dataset, consisting of echocardiography and real cardiac MRI image pairs, is composed and will be shared publicly. A dedicated Cycle-consistent Generative Adversarial Network (Cycle-GAN) is trained to learn the transformation from echocardiography frames to cardiac MRI views. An extensive set of qualitative evaluations shows that the proposed transformer can synthesize high-quality artifact-free synthetic cardiac MRI views from a given sequence of echocardiography frames. Medical evaluations performed by a group of cardiologists further demonstrate that synthetic MRI views are indistinguishable from their original counterparts and are preferred over their initial sequence of echocardiography frames for diagnosis in 78.9% of the cases.
* 18 pages, 42 figures
Via
Dec 02, 2024
Abstract:The problem of converting images of text into plain text is a widely researched topic in both academia and industry. Arabic handwritten Text Recognation (AHTR) poses additional challenges due to diverse handwriting styles and limited labeled data. In this paper we present a complete OCR pipeline that starts with line segmentation using Differentiable Binarization and Adaptive Scale Fusion techniques to ensure accurate detection of text lines. Following segmentation, a CNN-BiLSTM-CTC architecture is applied to recognize characters. Our system, trained on the Arabic Multi-Fonts Dataset (AMFDS), achieves a Character Recognition Rate (CRR) of 99.20% and a Word Recognition Rate (WRR) of 93.75% on single-word samples containing 7 to 10 characters, along with a CRR of 83.76% for sentences. These results demonstrate the system's strong performance in handling Arabic scripts, establishing a new benchmark for AHTR systems.
Via
Nov 20, 2024
Abstract:Line segment detection is a fundamental low-level task in computer vision, and improvements in this task can impact more advanced methods that depend on it. Most new methods developed for line segment detection are based on Convolutional Neural Networks (CNNs). Our paper seeks to address challenges that prevent the wider adoption of transformer-based methods for line segment detection. More specifically, we introduce a new model called Deformable Transformer-based Line Segment Detection (DT-LSD) that supports cross-scale interactions and can be trained quickly. This work proposes a novel Deformable Transformer-based Line Segment Detector (DT-LSD) that addresses LETR's drawbacks. For faster training, we introduce Line Contrastive DeNoising (LCDN), a technique that stabilizes the one-to-one matching process and speeds up training by 34$\times$. We show that DT-LSD is faster and more accurate than its predecessor transformer-based model (LETR) and outperforms all CNN-based models in terms of accuracy. In the Wireframe dataset, DT-LSD achieves 71.7 for $sAP^{10}$ and 73.9 for $sAP^{15}$; while 33.2 for $sAP^{10}$ and 35.1 for $sAP^{15}$ in the YorkUrban dataset.
Via
Nov 07, 2024
Abstract:In this paper we present a method for line segment detection in images, based on a semi-supervised framework. Leveraging the use of a consistency loss based on differently augmented and perturbed unlabeled images with a small amount of labeled data, we show comparable results to fully supervised methods. This opens up application scenarios where annotation is difficult or expensive, and for domain specific adaptation of models. We are specifically interested in real-time and online applications, and investigate small and efficient learning backbones. Our method is to our knowledge the first to target line detection using modern state-of-the-art methodologies for semi-supervised learning. We test the method on both standard benchmarks and domain specific scenarios for forestry applications, showing the tractability of the proposed method.
* 9 pages, 6 figures, 7 tables
Via
Nov 16, 2024
Abstract:Lane detection involves identifying lanes on the road and accurately determining their location and shape. This is a crucial technique for modern assisted and autonomous driving systems. However, several unique properties of lanes pose challenges for detection methods. The lack of distinctive features can cause lane detection algorithms to be confused by other objects with similar appearances. Additionally, the varying number of lanes and the diversity in lane line patterns, such as solid, broken, single, double, merging, and splitting lines, further complicate the task. To address these challenges, Deep Learning (DL) approaches can be employed in various ways. Merging DL models with an attention mechanism has recently surfaced as a new approach. In this context, two deep learning-based lane recognition methods are proposed in this study. The first method employs the Feature Pyramid Network (FPN) model, delivering an impressive 87.59% accuracy in detecting road lanes. The second method, which incorporates attention layers into the U-Net model, significantly boosts the performance of semantic segmentation tasks. The advanced model, achieving an extraordinary 98.98% accuracy and far surpassing the basic U-Net model, clearly showcases its superiority over existing methods in a comparative analysis. The groundbreaking findings of this research pave the way for the development of more effective and reliable road lane detection methods, significantly advancing the capabilities of modern assisted and autonomous driving systems.
Via
Nov 13, 2024
Abstract:Purpose: High-speed video (HSV) phase detection (PD) segmentation is vital in nuclear reactors, chemical processing, and electronics cooling for detecting vapor, liquid, and microlayer phases. Traditional segmentation models face pixel-level accuracy and generalization issues in multimodal data. MSEG-VCUQ introduces VideoSAM, a hybrid framework leveraging convolutional neural networks (CNNs) and transformer-based vision models to enhance segmentation accuracy and generalizability across complex multimodal PD tasks. Methods: VideoSAM combines U-Net CNN and the Segment Anything Model (SAM) for advanced feature extraction and segmentation across diverse HSV PD modalities, spanning fluids like water, FC-72, nitrogen, and argon under varied heat flux conditions. The framework also incorporates uncertainty quantification (UQ) to assess pixel-based discretization errors, delivering reliable metrics such as contact line density and dry area fraction under experimental conditions. Results: VideoSAM outperforms SAM and modality-specific CNN models in segmentation accuracy, excelling in environments with complex phase boundaries, overlapping bubbles, and dynamic liquid-vapor interactions. Its hybrid architecture supports cross-dataset generalization, adapting effectively to varying modalities. The UQ module provides accurate error estimates, enhancing the reliability of segmentation outputs for advanced HSV PD research. Conclusion: MSEG-VCUQ, via VideoSAM, offers a robust solution for HSV PD segmentation, addressing previous limitations with advanced deep learning and UQ techniques. The open-source datasets and tools introduced enable scalable, precise, and adaptable segmentation for multimodal PD datasets, supporting advancements in HSV analysis and autonomous experimentation. The codes and data used for this paper are publicly available at: \url{https://github.com/chikap421/mseg_vcuq}
* Under Review in EAAI
Via
Nov 04, 2024
Abstract:Objects, in the real world, rarely occur in isolation and exhibit typical arrangements governed by their independent utility, and their expected interaction with humans and other objects in the context. For example, a chair is expected near a table, and a computer is expected on top. Humans use this spatial context and relative placement as an important cue for visual recognition in case of ambiguities. Similar to human's, DNN's exploit contextual information from data to learn representations. Our research focuses on harnessing the contextual aspects of visual data to optimize data annotation and enhance the training of deep networks. Our contributions can be summarized as follows: (1) We introduce the notion of contextual diversity for active learning CDAL and show its applicability in three different visual tasks semantic segmentation, object detection and image classification, (2) We propose a data repair algorithm to curate contextually fair data to reduce model bias, enabling the model to detect objects out of their obvious context, (3) We propose Class-based annotation, where contextually relevant classes are selected that are complementary for model training under domain shift. Understanding the importance of well-curated data, we also emphasize the necessity of involving humans in the loop to achieve accurate annotations and to develop novel interaction strategies that allow humans to serve as fact-checkers. In line with this we are working on developing image retrieval system for wildlife camera trap images and reliable warning system for poor quality rural roads. For large-scale annotation, we are employing a strategic combination of human expertise and zero-shot models, while also integrating human input at various stages for continuous feedback.
* ICVGIP, Young Researchers Symposium
Via
Sep 06, 2024
Abstract:In this research, we introduce a unified end-to-end Automated Defect Classification-Detection-Segmentation (ADCDS) framework for classifying, detecting, and segmenting multiple instances of semiconductor defects for advanced nodes. This framework consists of two modules: (a) a defect detection module, followed by (b) a defect segmentation module. The defect detection module employs Deformable DETR to aid in the classification and detection of nano-scale defects, while the segmentation module utilizes BoxSnake. BoxSnake facilitates box-supervised instance segmentation of nano-scale defects, supported by the former module. This simplifies the process by eliminating the laborious requirement for ground-truth pixel-wise mask annotation by human experts, which is typically associated with training conventional segmentation models. We have evaluated the performance of our ADCDS framework using two distinct process datasets from real wafers, as ADI and AEI, specifically focusing on Line-space patterns. We have demonstrated the applicability and significance of our proposed methodology, particularly in the nano-scale segmentation and generation of binary defect masks, using the challenging ADI SEM dataset where ground-truth pixelwise segmentation annotations were unavailable. Furthermore, we have presented a comparative analysis of our proposed framework against previous approaches to demonstrate its effectiveness. Our proposed framework achieved an overall mAP@IoU0.5 of 72.19 for detection and 78.86 for segmentation on the ADI dataset. Similarly, for the AEI dataset, these metrics were 90.38 for detection and 95.48 for segmentation. Thus, our proposed framework effectively fulfils the requirements of advanced defect analysis while addressing significant constraints.
* Accepted in ECCV 2024 2nd workshop on Vision-based InduStrial
InspectiON (VISION)
Via
Sep 11, 2024
Abstract:Computed tomography (CT) reconstruction plays a crucial role in industrial nondestructive testing and medical diagnosis. Sparse view CT reconstruction aims to reconstruct high-quality CT images while only using a small number of projections, which helps to improve the detection speed of industrial assembly lines and is also meaningful for reducing radiation in medical scenarios. Sparse CT reconstruction methods based on implicit neural representations (INRs) have recently shown promising performance, but still produce artifacts because of the difficulty of obtaining useful prior information. In this work, we incorporate a powerful prior: the total number of material categories of objects. To utilize the prior, we design AC-IND, a self-supervised method based on Attenuation Coefficient Estimation and Implicit Neural Distribution. Specifically, our method first transforms the traditional INR from scalar mapping to probability distribution mapping. Then we design a compact attenuation coefficient estimator initialized with values from a rough reconstruction and fast segmentation. Finally, our algorithm finishes the CT reconstruction by jointly optimizing the estimator and the generated distribution. Through experiments, we find that our method not only outperforms the comparative methods in sparse CT reconstruction but also can automatically generate semantic segmentation maps.
* 12 pages
Via
Aug 22, 2024
Abstract:Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots' entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: https://streamable.com/jjw54x.
* 26th Irish Machine Vision and Image Processing Conference,
Data-Driven Autonomy Workshop (matching camera-ready version)
Via