Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alison D. Gernand

Reading a Ruler in the Wild

Jul 09, 2025

Yimu Pan, Manas Mehta, Gwen Sincerbeaux, Jeffery A. Goldstein, Alison D. Gernand, James Z. Wang

Abstract:Accurately converting pixel measurements into absolute real-world dimensions remains a fundamental challenge in computer vision and limits progress in key applications such as biomedicine, forensics, nutritional analysis, and e-commerce. We introduce RulerNet, a deep learning framework that robustly infers scale "in the wild" by reformulating ruler reading as a unified keypoint-detection problem and by representing the ruler with geometric-progression parameters that are invariant to perspective transformations. Unlike traditional methods that rely on handcrafted thresholds or rigid, ruler-specific pipelines, RulerNet directly localizes centimeter marks using a distortion-invariant annotation and training strategy, enabling strong generalization across diverse ruler types and imaging conditions while mitigating data scarcity. We also present a scalable synthetic-data pipeline that combines graphics-based ruler generation with ControlNet to add photorealistic context, greatly increasing training diversity and improving performance. To further enhance robustness and efficiency, we propose DeepGP, a lightweight feed-forward network that regresses geometric-progression parameters from noisy marks and eliminates iterative optimization, enabling real-time scale estimation on mobile or edge devices. Experiments show that RulerNet delivers accurate, consistent, and efficient scale estimates under challenging real-world conditions. These results underscore its utility as a generalizable measurement tool and its potential for integration with other vision components for automated, scale-aware analysis in high-impact domains. A live demo is available at https://huggingface.co/spaces/ymp5078/RulerNet-Demo.

Via

Access Paper or Ask Questions

S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Dec 17, 2024

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 2 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 3 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Figure 4 for S2S2: Semantic Stacking for Robust Semantic Segmentation in Medical Imaging

Abstract:Robustness and generalizability in medical image segmentation are often hindered by scarcity and limited diversity of training data, which stands in contrast to the variability encountered during inference. While conventional strategies -- such as domain-specific augmentation, specialized architectures, and tailored training procedures -- can alleviate these issues, they depend on the availability and reliability of domain knowledge. When such knowledge is unavailable, misleading, or improperly applied, performance may deteriorate. In response, we introduce a novel, domain-agnostic, add-on, and data-driven strategy inspired by image stacking in image denoising. Termed ``semantic stacking,'' our method estimates a denoised semantic representation that complements the conventional segmentation loss during training. This method does not depend on domain-specific assumptions, making it broadly applicable across diverse image modalities, model architectures, and augmentation techniques. Through extensive experiments, we validate the superiority of our approach in improving segmentation performance under diverse conditions. Code is available at https://github.com/ymp5078/Semantic-Stacking.

* AAAI2025

Via

Access Paper or Ask Questions

AI-SAM: Automatic and Interactive Segment Anything Model

Dec 05, 2023

Yimu Pan, Sitao Zhang, Alison D. Gernand, Jeffery A. Goldstein, James Z. Wang

Figure 1 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 2 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 3 for AI-SAM: Automatic and Interactive Segment Anything Model

Figure 4 for AI-SAM: Automatic and Interactive Segment Anything Model

Abstract:Semantic segmentation is a core task in computer vision. Existing methods are generally divided into two categories: automatic and interactive. Interactive approaches, exemplified by the Segment Anything Model (SAM), have shown promise as pre-trained models. However, current adaptation strategies for these models tend to lean towards either automatic or interactive approaches. Interactive methods depend on prompts user input to operate, while automatic ones bypass the interactive promptability entirely. Addressing these limitations, we introduce a novel paradigm and its first model: the Automatic and Interactive Segment Anything Model (AI-SAM). In this paradigm, we conduct a comprehensive analysis of prompt quality and introduce the pioneering Automatic and Interactive Prompter (AI-Prompter) that automatically generates initial point prompts while accepting additional user inputs. Our experimental results demonstrate AI-SAM's effectiveness in the automatic setting, achieving state-of-the-art performance. Significantly, it offers the flexibility to incorporate additional user prompts, thereby further enhancing its performance. The project page is available at https://github.com/ymp5078/AI-SAM.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions