Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sacha Arnoud

End-to-End Interpretation of the French Street Name Signs Dataset

Feb 13, 2017

Raymond Smith, Chunhui Gu, Dar-Shyang Lee, Huiyi Hu, Ranjith Unnikrishnan, Julian Ibarz, Sacha Arnoud, Sophia Lin

Figure 1 for End-to-End Interpretation of the French Street Name Signs Dataset

Figure 2 for End-to-End Interpretation of the French Street Name Signs Dataset

Figure 3 for End-to-End Interpretation of the French Street Name Signs Dataset

Figure 4 for End-to-End Interpretation of the French Street Name Signs Dataset

Abstract:We introduce the French Street Name Signs (FSNS) Dataset consisting of more than a million images of street name signs cropped from Google Street View images of France. Each image contains several views of the same street name sign. Every image has normalized, title case folded ground-truth text as it would appear on a map. We believe that the FSNS dataset is large and complex enough to train a deep network of significant complexity to solve the street name extraction problem "end-to-end" or to explore the design trade-offs between a single complex engineered network and multiple sub-networks designed and trained to solve sub-problems. We present such an "end-to-end" network/graph for Tensor Flow and its results on the FSNS dataset.

* Computer Vision - ECCV 2016 Workshops Volume 9913 of the series Lecture Notes in Computer Science pp 411-426
* Presented at the IWRR workshop at ECCV 2016

Via

Access Paper or Ask Questions

Large Scale Business Discovery from Street Level Imagery

Feb 02, 2016

Qian Yu, Christian Szegedy, Martin C. Stumpe, Liron Yatziv, Vinay Shet, Julian Ibarz, Sacha Arnoud

Figure 1 for Large Scale Business Discovery from Street Level Imagery

Figure 2 for Large Scale Business Discovery from Street Level Imagery

Figure 3 for Large Scale Business Discovery from Street Level Imagery

Figure 4 for Large Scale Business Discovery from Street Level Imagery

Abstract:Search with local intent is becoming increasingly useful due to the popularity of the mobile device. The creation and maintenance of accurate listings of local businesses worldwide is time consuming and expensive. In this paper, we propose an approach to automatically discover businesses that are visible on street level imagery. Precise business store front detection enables accurate geo-location of businesses, and further provides input for business categorization, listing generation, etc. The large variety of business categories in different countries makes this a very challenging problem. Moreover, manual annotation is prohibitive due to the scale of this problem. We propose the use of a MultiBox based approach that takes input image pixels and directly outputs store front bounding boxes. This end-to-end learning approach instead preempts the need for hand modeling either the proposal generation phase or the post-processing phase, leveraging large labelled training datasets. We demonstrate our approach outperforms the state of the art detection techniques with a large margin in terms of performance and run-time efficiency. In the evaluation, we show this approach achieves human accuracy in the low-recall settings. We also provide an end-to-end evaluation of business discovery in the real world.

Via

Access Paper or Ask Questions

Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Apr 14, 2014

Ian J. Goodfellow, Yaroslav Bulatov, Julian Ibarz, Sacha Arnoud, Vinay Shet

Figure 1 for Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Figure 2 for Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Figure 3 for Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Figure 4 for Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

Abstract:Recognizing arbitrary multi-character text in unconstrained natural photographs is a hard problem. In this paper, we address an equally hard sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from Street View imagery. Traditional approaches to solve this problem typically separate out the localization, segmentation, and recognition steps. In this paper we propose a unified approach that integrates these three steps via the use of a deep convolutional neural network that operates directly on the image pixels. We employ the DistBelief implementation of deep neural networks in order to train large, distributed neural networks on high quality images. We find that the performance of this approach increases with the depth of the convolutional network, with the best performance occurring in the deepest architecture we trained, with eleven hidden layers. We evaluate this approach on the publicly available SVHN dataset and achieve over $96\%$ accuracy in recognizing complete street numbers. We show that on a per-digit recognition task, we improve upon the state-of-the-art, achieving $97.84\%$ accuracy. We also evaluate this approach on an even more challenging dataset generated from Street View imagery containing several tens of millions of street number annotations and achieve over $90\%$ accuracy. To further explore the applicability of the proposed system to broader text recognition tasks, we apply it to synthetic distorted text from reCAPTCHA. reCAPTCHA is one of the most secure reverse turing tests that uses distorted text to distinguish humans from bots. We report a $99.8\%$ accuracy on the hardest category of reCAPTCHA. Our evaluations on both tasks indicate that at specific operating thresholds, the performance of the proposed system is comparable to, and in some cases exceeds, that of human operators.

Via

Access Paper or Ask Questions