Abstract:The Multimodal Learning Workshop (PBVS 2024) aims to improve the performance of automatic target recognition (ATR) systems by leveraging both Synthetic Aperture Radar (SAR) data, which is difficult to interpret but remains unaffected by weather conditions and visible light, and Electro-Optical (EO) data for simultaneous learning. The subtask, known as the Multi-modal Aerial View Imagery Challenge - Classification, focuses on predicting the class label of a low-resolution aerial image based on a set of SAR-EO image pairs and their respective class labels. The provided dataset consists of SAR-EO pairs, characterized by a severe long-tail distribution with over a 1000-fold difference between the largest and smallest classes, making typical long-tail methods difficult to apply. Additionally, the domain disparity between the SAR and EO datasets complicates the effectiveness of standard multimodal methods. To address these significant challenges, we propose a two-stage learning approach that utilizes self-supervised techniques, combined with multimodal learning and inference through SAR-to-EO translation for effective EO utilization. In the final testing phase of the PBVS 2024 Multi-modal Aerial View Image Challenge - Classification (SAR Classification) task, our model achieved an accuracy of 21.45%, an AUC of 0.56, and a total score of 0.30, placing us 9th in the competition.
Abstract:In the past decades, clean and renewable energy has gained increasing attention due to a global effort on carbon footprint reduction. In particular, Saudi Arabia is gradually shifting its energy portfolio from an exclusive use of oil to a reliance on renewable energy, and, in particular, wind. Modeling wind for assessing potential energy output in a country as large, geographically diverse and understudied as Saudi Arabia is a challenge which implies highly non-linear dynamic structures in both space and time. To address this, we propose a spatio-temporal model whose spatial information is first reduced via an energy distance-based approach and then its dynamical behavior is informed by a sparse and stochastic recurrent neural network (Echo State Network). Finally, the full spatial data is reconstructed by means of a non-stationary stochastic partial differential equation-based approach. Our model can capture the fine scale wind structure and produce more accurate forecasts of both wind speed and energy in lead times of interest for energy grid management and save annually as much as one million dollar against the closest competitive model.
Abstract:The growth in artificial intelligence (AI) technology has attracted substantial interests in age-of-information (AoI)-aware task offloading of mobile edge computing (MEC)-namely, minimizing service latency. Additionally, the use of MEC systems poses an additional problem arising from limited battery resources of MDs. This paper tackles the pressing challenge of AoI-aware distributed task offloading optimization, where user association (UA), resource allocation (RA), full-task offloading, and battery of mobile devices (MDs) are jointly considered. In existing studies, joint optimization of overall task offloading and UA is seldom considered due to the complexity of combinatorial optimization problems, and in cases where it is considered, linear objective functions such as power consumption are adopted. Revolutionizing the realm of MEC, our objective includes all major components contributing to users' quality of experience, including AoI and energy consumption. To achieve this, we first formulate an NP-hard combinatorial problem, where the objective function comprises three elements: communication latency, computation latency, and battery usage. We derive a closed-form RA solution of the problem; next, we provide a distributed pricing-based UA solution. We simulate the proposed algorithm for various vision and language AI tasks. Our numerical results show that the proposed method Pareto-dominates baseline methods. More specifically, the results demonstrate that the proposed method can outperform baseline methods by 1.62 times smaller AoI with 41.2% less energy consumption.
Abstract:For Large Language Models (LLMs) to be effectively deployed in a specific country, they must possess an understanding of the nation's culture and basic knowledge. To this end, we introduce National Alignment, which measures an alignment between an LLM and a targeted country from two aspects: social value alignment and common knowledge alignment. Social value alignment evaluates how well the model understands nation-specific social values, while common knowledge alignment examines how well the model captures basic knowledge related to the nation. We constructed KorNAT, the first benchmark that measures national alignment with South Korea. For the social value dataset, we obtained ground truth labels from a large-scale survey involving 6,174 unique Korean participants. For the common knowledge dataset, we constructed samples based on Korean textbooks and GED reference materials. KorNAT contains 4K and 6K multiple-choice questions for social value and common knowledge, respectively. Our dataset creation process is meticulously designed and based on statistical sampling theory and was refined through multiple rounds of human review. The experiment results of seven LLMs reveal that only a few models met our reference score, indicating a potential for further enhancement. KorNAT has received government approval after passing an assessment conducted by a government-affiliated organization dedicated to evaluating dataset quality. Samples and detailed evaluation protocols of our dataset can be found in https://selectstar.ai/ko/papers-national-alignment
Abstract:The Multimodal Learning for Earth and Environment Workshop (MultiEarth 2023) aims to harness the substantial amount of remote sensing data gathered over extensive periods for the monitoring and analysis of Earth's ecosystems'health. The subtask, Multimodal SAR-to-EO Image Translation, involves the use of robust SAR data, even under adverse weather and lighting conditions, transforming it into high-quality, clear, and visually appealing EO data. In the context of the SAR2EO task, the presence of clouds or obstructions in EO data can potentially pose a challenge. To address this issue, we propose the Clean Collector Algorithm (CCA), designed to take full advantage of this cloudless SAR data and eliminate factors that may hinder the data learning process. Subsequently, we applied pix2pixHD for the SAR-to-EO translation and Restormer for image enhancement. In the final evaluation, the team 'CDRL' achieved an MAE of 0.07313, securing the top rank on the leaderboard.
Abstract:Toxic language, such as hate speech, can deter users from participating in online communities and enjoying popular platforms. Previous approaches to detecting toxic language and norm violations have been primarily concerned with conversations from online forums and social media, such as Reddit and Twitter. These approaches are less effective when applied to conversations on live-streaming platforms, such as Twitch and YouTube Live, as each comment is only visible for a limited time and lacks a thread structure that establishes its relationship with other comments. In this work, we share the first NLP study dedicated to detecting norm violations in conversations on live-streaming platforms. We define norm violation categories in live-stream chats and annotate 4,583 moderated comments from Twitch. We articulate several facets of live-stream data that differ from other forums, and demonstrate that existing models perform poorly in this setting. By conducting a user study, we identify the informational context humans use in live-stream moderation, and train models leveraging context to identify norm violations. Our results show that appropriate contextual information can boost moderation performance by 35\%.
Abstract:This paper reviews exploration techniques in deep reinforcement learning. Exploration techniques are of primary importance when solving sparse reward problems. In sparse reward problems, the reward is rare, which means that the agent will not find the reward often by acting randomly. In such a scenario, it is challenging for reinforcement learning to learn rewards and actions association. Thus more sophisticated exploration methods need to be devised. This review provides a comprehensive overview of existing exploration approaches, which are categorized based on the key contributions as follows reward novel states, reward diverse behaviours, goal-based methods, probabilistic methods, imitation-based methods, safe exploration and random-based methods. Then, the unsolved challenges are discussed to provide valuable future research directions. Finally, the approaches of different categories are compared in terms of complexity, computational effort and overall performance.