Abstract:Remote sensing change captioning (RSICC) aims to describe changes between bitemporal images in natural language. Existing methods often fail under challenges like illumination differences, viewpoint changes, blur effects, leading to inaccuracies, especially in no-change regions. Moreover, the images acquired at different spatial resolutions and have registration errors tend to affect the captions. To address these issues, we introduce SECOND-CC, a novel RSICC dataset featuring high-resolution RGB image pairs, semantic segmentation maps, and diverse real-world scenarios. SECOND-CC which contains 6,041 pairs of bitemporal RS images and 30,205 sentences describing the differences between images. Additionally, we propose MModalCC, a multimodal framework that integrates semantic and visual data using advanced attention mechanisms, including Cross-Modal Cross Attention (CMCA) and Multimodal Gated Cross Attention (MGCA). Detailed ablation studies and attention visualizations further demonstrate its effectiveness and ability to address RSICC challenges. Comprehensive experiments show that MModalCC outperforms state-of-the-art RSICC methods, including RSICCformer, Chg2Cap, and PSNet with +4.6% improvement on BLEU4 score and +9.6% improvement on CIDEr score. We will make our dataset and codebase publicly available to facilitate future research at https://github.com/ChangeCapsInRS/SecondCC
Abstract:Dimensionality Reduction (DR) techniques such as t-SNE and UMAP are popular for transforming complex datasets into simpler visual representations. However, while effective in uncovering general dataset patterns, these methods may introduce artifacts and suffer from interpretability issues. This paper presents DimVis, a visualization tool that employs supervised Explainable Boosting Machine (EBM) models (trained on user-selected data of interest) as an interpretation assistant for DR projections. Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters through interactive exploration of UMAP projections. Specifically, DimVis uses a contrastive EBM model that is trained in real time to differentiate between the data inside and outside a cluster of interest. Taking advantage of the inherent explainable nature of the EBM, we then use this model to interpret the cluster itself via single and pairwise feature comparisons in a ranking based on the EBM model's feature importance. The applicability and effectiveness of DimVis are demonstrated through two use cases involving real-world datasets, and we also discuss the limitations and potential directions for future research.
Abstract:A Music Recommendation System based on Emotion, Age, and Ethnicity is developed in this study, using FER-2013 and ``Age, Gender, and Ethnicity (Face Data) CSV'' datasets. The CNN architecture, which is extensively used for this kind of purpose has been applied to the training of the models. After adding several appropriate layers to the training end of the project, in total, 3 separate models are trained in the Deep Learning side of the project: Emotion, Ethnicity, and Age. After the training step of these models, they are used as classifiers on the web application side. The snapshot of the user taken through the interface is sent to the models to predict their mood, age, and ethnic origin. According to these classifiers, various kinds of playlists pulled from Spotify API are proposed to the user in order to establish a functional and user-friendly atmosphere for the music selection. Afterward, the user can choose the playlist they want and listen to it by following the given link.