



Abstract:The emergence of crowdsourced data has significantly reshaped social science, enabling extensive exploration of collective human actions, viewpoints, and societal dynamics. However, ensuring safe, fair, and reliable participation remains a persistent challenge. Traditional polling methods have seen a notable decline in engagement over recent decades, raising concerns about the credibility of collected data. Meanwhile, social and peer-to-peer networks have become increasingly widespread, but data from these platforms can suffer from credibility issues due to fraudulent or ineligible participation. In this paper, we explore how social interactions can help restore credibility in crowdsourced data collected over social networks. We present an empirical study to detect ineligible participation in a polling task through AI-based graph analysis of social interactions among imperfect participants composed of honest and dishonest actors. Our approach focuses solely on the structure of social interaction graphs, without relying on the content being shared. We simulate different levels and types of dishonest behavior among participants who attempt to propagate the task within their social networks. We conduct experiments on real-world social network datasets, using different eligibility criteria and modeling diverse participation patterns. Although structural differences in social interaction graphs introduce some performance variability, our study achieves promising results in detecting ineligibility across diverse social and behavioral profiles, with accuracy exceeding 90% in some configurations.
Abstract:Federated Learning (FL) is a privacy-enforcing sub-domain of machine learning that brings the model to the user's device for training, avoiding the need to share personal data with a central server. While existing works address data heterogeneity, they overlook other challenges in FL, such as device heterogeneity and communication efficiency. In this paper, we propose RE-FL, a novel approach that tackles computational and communication challenges in resource-constrained devices. Our variable pruning technique optimizes resource utilization by adapting pruning to each client's computational capabilities. We also employ knowledge distillation to reduce bandwidth consumption and communication rounds. Experimental results on image classification tasks demonstrate the effectiveness of our approach in resource-constrained environments, maintaining data privacy and performance while accommodating heterogeneous model architectures.
Abstract:Historic dress artifacts are a valuable source for human studies. In particular, they can provide important insights into the social aspects of their corresponding era. These insights are commonly drawn from garment pictures as well as the accompanying descriptions and are usually stored in a standardized and controlled vocabulary that accurately describes garments and costume items, called the Costume Core Vocabulary. Building an accurate Costume Core from garment descriptions can be challenging because the historic garment items are often donated, and the accompanying descriptions can be based on untrained individuals and use a language common to the period of the items. In this paper, we present an approach to use Natural Language Processing (NLP) to map the free-form text descriptions of the historic items to that of the controlled vocabulary provided by the Costume Core. Despite the limited dataset, we were able to train an NLP model based on the Universal Sentence Encoder to perform this mapping with more than 90% test accuracy for a subset of the Costume Core vocabulary. We describe our methodology, design choices, and development of our approach, and show the feasibility of predicting the Costume Core for unseen descriptions. With more garment descriptions still being curated to be used for training, we expect to have higher accuracy for better generalizability.