Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhongkai Shangguan

XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Oct 08, 2023

Lei Lai, Zhongkai Shangguan, Jimuyang Zhang, Eshed Ohn-Bar

Figure 1 for XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Figure 2 for XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Figure 3 for XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Figure 4 for XVO: Generalized Visual Odometry via Cross-Modal Self-Training

Abstract:We propose XVO, a semi-supervised learning method for training generalized monocular Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and settings. In contrast to standard monocular VO approaches which often study a known calibration within a single dataset, XVO efficiently learns to recover relative pose with real-world scale from visual scene semantics, i.e., without relying on any known camera parameters. We optimize the motion estimation model via self-training from large amounts of unconstrained and heterogeneous dash camera videos available on YouTube. Our key contribution is twofold. First, we empirically demonstrate the benefits of semi-supervised training for learning a general-purpose direct VO regression network. Second, we demonstrate multi-modal supervision, including segmentation, flow, depth, and audio auxiliary prediction tasks, to facilitate generalized representations for the VO task. Specifically, we find audio prediction task to significantly enhance the semi-supervised learning process while alleviating noisy pseudo-labels, particularly in highly dynamic and out-of-domain video data. Our proposed teacher network achieves state-of-the-art performance on the commonly used KITTI benchmark despite no multi-frame optimization or knowledge of camera parameters. Combined with the proposed semi-supervised step, XVO demonstrates off-the-shelf knowledge transfer across diverse conditions on KITTI, nuScenes, and Argoverse without fine-tuning.

* ICCV 2023, Paris https://genxvo.github.io/

Via

Access Paper or Ask Questions

Trend and Thoughts: Understanding Climate Change Concern using Machine Learning and Social Media Data

Nov 06, 2021

Zhongkai Shangguan, Zihe Zheng, Lei Lin

Figure 1 for Trend and Thoughts: Understanding Climate Change Concern using Machine Learning and Social Media Data

Figure 2 for Trend and Thoughts: Understanding Climate Change Concern using Machine Learning and Social Media Data

Figure 3 for Trend and Thoughts: Understanding Climate Change Concern using Machine Learning and Social Media Data

Abstract:Nowadays social media platforms such as Twitter provide a great opportunity to understand public opinion of climate change compared to traditional survey methods. In this paper, we constructed a massive climate change Twitter dataset and conducted comprehensive analysis using machine learning. By conducting topic modeling and natural language processing, we show the relationship between the number of tweets about climate change and major climate events; the common topics people discuss climate change; and the trend of sentiment. Our dataset was published on Kaggle (\url{https://www.kaggle.com/leonshangguan/climate-change-tweets-ids-until-aug-2021}) and can be used in further research.

Via

Access Paper or Ask Questions

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Jul 02, 2021

Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu(+23 more)

Figure 1 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 2 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 3 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Figure 4 for NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Abstract:In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition

* Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, 588-595
* 10 pages, 1 figure. Conference on Computer Vision and Pattern Recognition

Via

Access Paper or Ask Questions

Neural Process for Black-Box Model Optimization Under Bayesian Framework

Apr 03, 2021

Zhongkai Shangguan, Lei Lin, Wencheng Wu, Beilei Xu

Figure 1 for Neural Process for Black-Box Model Optimization Under Bayesian Framework

Figure 2 for Neural Process for Black-Box Model Optimization Under Bayesian Framework

Figure 3 for Neural Process for Black-Box Model Optimization Under Bayesian Framework

Figure 4 for Neural Process for Black-Box Model Optimization Under Bayesian Framework

Abstract:There are a large number of optimization problems in physical models where the relationships between model parameters and outputs are unknown or hard to track. These models are named as black-box models in general because they can only be viewed in terms of inputs and outputs, without knowledge of the internal workings. Optimizing the black-box model parameters has become increasingly expensive and time consuming as they have become more complex. Hence, developing effective and efficient black-box model optimization algorithms has become an important task. One powerful algorithm to solve such problem is Bayesian optimization, which can effectively estimates the model parameters that lead to the best performance, and Gaussian Process (GP) has been one of the most widely used surrogate model in Bayesian optimization. However, the time complexity of GP scales cubically with respect to the number of observed model outputs, and GP does not scale well with large parameter dimension either. Consequently, it has been challenging for GP to optimize black-box models that need to query many observations and/or have many parameters. To overcome the drawbacks of GP, in this study, we propose a general Bayesian optimization algorithm that employs a Neural Process (NP) as the surrogate model to perform black-box model optimization, namely, Neural Process for Bayesian Optimization (NPBO). In order to validate the benefits of NPBO, we compare NPBO with four benchmark approaches on a power system parameter optimization problem and a series of seven benchmark Bayesian optimization problems. The results show that the proposed NPBO performs better than the other four benchmark approaches on the power system parameter optimization problem and competitively on the seven benchmark problems.

* This paper has been accepted to AAAI-MLPS 2021

Via

Access Paper or Ask Questions

Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Aug 18, 2020

Jing Shi, Zhiheng Li, Haitian Zheng, Yihang Xu, Tianyou Xiao, Weitao Tan, Xiaoning Guo, Sizhe Li, Bin Yang, Zhexin Xu(+23 more)

Figure 1 for Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Figure 2 for Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Figure 3 for Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Figure 4 for Actor-Action Video Classification CSC 249/449 Spring 2020 Challenge Report

Abstract:This technical report summarizes submissions and compiles from Actor-Action video classification challenge held as a final project in CSC 249/449 Machine Vision course (Spring 2020) at University of Rochester

Via

Access Paper or Ask Questions