Abstract:In the absence of nonverbal cues during messaging communication, users express part of their emotions using emojis. Thus, having emojis in the vocabulary of text messaging language models can significantly improve many natural language processing (NLP) applications such as online communication analysis. On the other hand, word embedding models are usually trained on a very large corpus of text such as Wikipedia or Google News datasets that include very few samples with emojis. In this study, we create emojiSpace, which is a combined word-emoji embedding using the word2vec model from the Genism library in Python. We trained emojiSpace on a corpus of more than 4 billion tweets and evaluated it by implementing sentiment analysis on a Twitter dataset containing more than 67 million tweets as an extrinsic task. For this task, we compared the performance of two different classifiers of random forest (RF) and linear support vector machine (SVM). For evaluation, we compared emojiSpace performance with two other pre-trained embeddings and demonstrated that emojiSpace outperforms both.
Abstract:Dynamic models of occupancy patterns have shown to be effective in optimizing building-systems operations. Previous research has relied on CO$_2$ sensors and vision-based techniques to determine occupancy patterns. Vision-based techniques provide highly accurate information; however, they are very intrusive. Therefore, motion or CO$_2$ sensors are more widely adopted worldwide. Volatile Organic Compounds (VOCs) are another pollutant originating from the occupants. However, a limited number of studies have evaluated the impact of occupants on the VOC level. In this paper, continuous measurements of CO$_2$, VOC, light, temperature, and humidity were recorded in a 17,000 sqft open office space for around four months. Using different statistical models (e.g., SVM, K-Nearest Neighbors, and Random Forest) we evaluated which combination of environmental factors provides more accurate insights on occupant presence. Our preliminary results indicate that VOC is a good indicator of occupancy detection in some cases. It is also concluded that proper feature selection and developing appropriate global occupancy detection models can reduce the cost and energy of data collection without a significant impact on accuracy.