Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miquel Martí

Real-Time Semantic Stereo Matching

Oct 01, 2019

Pier Luigi Dovesi, Matteo Poggi, Lorenzo Andraghetti, Miquel Martí, Hedvig Kjellström, Alessandro Pieropan, Stefano Mattoccia

Figure 1 for Real-Time Semantic Stereo Matching

Figure 2 for Real-Time Semantic Stereo Matching

Figure 3 for Real-Time Semantic Stereo Matching

Figure 4 for Real-Time Semantic Stereo Matching

Abstract:Scene understanding is paramount in robotics, self-navigation, augmented reality, and many other fields. To fully accomplish this task, an autonomous agent has to infer the 3D structure of the sensed scene (to know where it looks at) and its content (to know what it sees). To tackle the two tasks, deep neural networks trained to infer semantic segmentation and depth from stereo images are often the preferred choices. Specifically, Semantic Stereo Matching can be tackled by either standalone models trained for the two tasks independently or joint end-to-end architectures. Nonetheless, as proposed so far, both solutions are inefficient because requiring two forward passes in the former case or due to the complexity of a single network in the latter, although jointly tackling both tasks is usually beneficial in terms of accuracy. In this paper, we propose a single compact and lightweight architecture for real-time semantic stereo matching. Our framework relies on coarse-to-fine estimations in a multi-stage fashion, allowing: i) very fast inference even on embedded devices, with marginal drops in accuracy, compared to state-of-the-art networks, ii) trade accuracy for speed, according to the specific application requirements. Experimental results on high-end GPUs as well as on an embedded Jetson TX2 confirm the superiority of semantic stereo matching compared to standalone tasks and highlight the versatility of our framework on any hardware and for any application.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

GANtruth - an unpaired image-to-image translation method for driving scenarios

Nov 26, 2018

Sebastian Bujwid, Miquel Martí, Hossein Azizpour, Alessandro Pieropan

Figure 1 for GANtruth - an unpaired image-to-image translation method for driving scenarios

Figure 2 for GANtruth - an unpaired image-to-image translation method for driving scenarios

Figure 3 for GANtruth - an unpaired image-to-image translation method for driving scenarios

Figure 4 for GANtruth - an unpaired image-to-image translation method for driving scenarios

Abstract:Synthetic image translation has significant potentials in autonomous transportation systems. That is due to the expense of data collection and annotation as well as the unmanageable diversity of real-words situations. The main issue with unpaired image-to-image translation is the ill-posed nature of the problem. In this work, we propose a novel method for constraining the output space of unpaired image-to-image translation. We make the assumption that the environment of the source domain is known (e.g. synthetically generated), and we propose to explicitly enforce preservation of the ground-truth labels on the translated images. We experiment on preserving ground-truth information such as semantic segmentation, disparity, and instance segmentation. We show significant evidence that our method achieves improved performance over the state-of-the-art model of UNIT for translating images from SYNTHIA to Cityscapes. The generated images are perceived as more realistic in human surveys and outperforms UNIT when used in a domain adaptation scenario for semantic segmentation.

* 32nd Conference on Neural Information Processing Systems (NeurIPS), Machine Learning for Intelligent Transportation Systems Workshop, Montr\'eal, Canada. 2018

Via

Access Paper or Ask Questions

A multitask deep learning model for real-time deployment in embedded systems

Oct 31, 2017

Miquel Martí, Atsuto Maki

Figure 1 for A multitask deep learning model for real-time deployment in embedded systems

Figure 2 for A multitask deep learning model for real-time deployment in embedded systems

Figure 3 for A multitask deep learning model for real-time deployment in embedded systems

Abstract:We propose an approach to Multitask Learning (MTL) to make deep learning models faster and lighter for applications in which multiple tasks need to be solved simultaneously, which is particularly useful in embedded, real-time systems. We develop a multitask model for both Object Detection and Semantic Segmentation and analyze the challenges that appear during its training. Our multitask network is 1.6x faster, lighter and uses less memory than deploying the single-task models in parallel. We conclude that MTL has the potential to give superior performance in exchange of a more complex training process that introduces challenges not present in single-task models.

* Swedish Symposium on Deep Learning 2017
* 2 pages, 5 figures. Poster presentation at Swedish Symposium on Deep Learning SSDL2017, Stockholm, Sweden. June 20-21, 2017

Via

Access Paper or Ask Questions

Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Jun 15, 2017

Mohammadamin Barekatain, Miquel Martí, Hsueh-Fu Shih, Samuel Murray, Kotaro Nakayama, Yutaka Matsuo, Helmut Prendinger

Figure 1 for Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Figure 2 for Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Figure 3 for Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Figure 4 for Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action Detection

Abstract:Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.

* Computer Vision and Pattern Recognition Workshops (CVPRW), Hawaii, USA, 2017

Via

Access Paper or Ask Questions