Abstract:Recently, researchers have shown an increased interest in harnessing Twitter data for dynamic monitoring of traffic conditions. Bag-of-words representation is a common method in literature for tweet modeling and retrieving traffic information, yet it suffers from the curse of dimensionality and sparsity. To address these issues, our specific objective is to propose a simple and robust framework on the top of word embedding for distinguishing traffic-related tweets against non-traffic-related ones. In our proposed model, a tweet is classified as traffic-related if semantic similarity between its words and a small set of traffic keywords exceeds a threshold value. Semantic similarity between words is captured by means of word-embedding models, which is an unsupervised learning tool. The proposed model is as simple as having only one trainable parameter. The model takes advantage of outstanding merits, which are demonstrated through several evaluation steps. The state-of-the-art test accuracy for our proposed model is 95.9%.
Abstract:Identifying the distribution of users' transportation modes is an essential part of travel demand analysis and transportation planning. With the advent of ubiquitous GPS-enabled devices (e.g., a smartphone), a cost-effective approach for inferring commuters' mobility mode(s) is to leverage their GPS trajectories. A majority of studies have proposed mode inference models based on hand-crafted features and traditional machine learning algorithms. However, manual features engender some major drawbacks including vulnerability to traffic and environmental conditions as well as possessing human's bias in creating efficient features. One way to overcome these issues is by utilizing Convolutional Neural Network (CNN) schemes that are capable of automatically driving high-level features from the raw input. Accordingly, in this paper, we take advantage of CNN architectures so as to predict travel modes based on only raw GPS trajectories, where the modes are labeled as walk, bike, bus, driving, and train. Our key contribution is designing the layout of the CNN's input layer in such a way that not only is adaptable with the CNN schemes but represents fundamental motion characteristics of a moving object including speed, acceleration, jerk, and bearing rate. Furthermore, we ameliorate the quality of GPS logs through several data preprocessing steps. Using the clean input layer, a variety of CNN configurations are evaluated to achieve the best CNN architecture. The highest accuracy of 84.8% has been achieved through the ensemble of the best CNN configuration. In this research, we contrast our methodology with traditional machine learning algorithms as well as the seminal and most related studies to demonstrate the superiority of our framework.