Abstract:This study reveals the important role of prevention care and medication adherence in reducing hospitalizations. By using a structured dataset of 1,171 patients, four machine learning models Logistic Regression, Gradient Boosting, Random Forest, and Artificial Neural Networks are applied to predict five-year hospitalization risk, with the Gradient Boosting model achieving the highest accuracy of 81.2%. The result demonstrated that patients with high medication adherence and consistent preventive care can reduce 38.3% and 37.7% in hospitalization risk. The finding also suggests that targeted preventive care can have positive Return on Investment (ROI), and therefore ML models can effectively direct personalized interventions and contribute to long-term medical savings.
Abstract:Long video understanding is a complex task that requires both spatial detail and temporal awareness. While Vision-Language Models (VLMs) obtain frame-level understanding capabilities through multi-frame input, they suffer from information loss due to the sparse sampling strategy. In contrast, Video Large Language Models (Video-LLMs) capture temporal relationships within visual features but are limited by the scarcity of high-quality video-text datasets. To transfer long video understanding capabilities to VLMs with minimal data and computational cost, we propose Lightweight Video Compression (LVC), a novel method featuring the Query-Attention Video Compression mechanism, which effectively tackles the sparse sampling problem in VLMs. By training only the alignment layer with 10k short video-text pairs, LVC significantly enhances the temporal reasoning abilities of VLMs. Extensive experiments show that LVC provides consistent performance improvements across various models, including the InternVL2 series and Phi-3.5-Vision. Notably, the InternVL2-40B-LVC achieves scores of 68.2 and 65.9 on the long video understanding benchmarks MLVU and Video-MME, respectively, with relative improvements of 14.6% and 7.7%. The enhanced models and code will be publicly available soon.
Abstract:Integrated sensing and communication (ISAC) has emerged as a pivotal enabling technology for sixth-generation (6G) mobile communication system. The ISAC research in dense urban areas has been plaguing by severe multipath interference, propelling the thorough research of ISAC multipath interference elimination. However, transforming the multipath component (MPC) from enemy into friend is a viable and mutually beneficial option. In this paper, we preliminarily explore the MPC-aided ISAC signal processing and apply a space-time code to improve the ISAC performance. Specifically, we propose a symbol-level fusion for MPC-aided localization (SFMC) scheme to achieve robust and high-accuracy localization, and apply a Khatri-Rao space-time (KRST) code to improve the communication and sensing performance in rich multipath environment. Simulation results demonstrate that the proposed SFMC scheme has more robust localization performance with higher accuracy, compared with the existing state-of-the-art schemes. The proposed SFMC would benefit highly reliable communication and sub-meter level localization in rich multipath scenarios.
Abstract:Automated segmentation of medical images highly depends on the availability of accurate manual image annotations. Such annotations are very time-consuming and costly to generate, and often require specialized expertise, particularly for cross-sectional images which contain many slices for each patient. It is crucial to ensure the best use of annotation resources. In this paper, we systematically answer the question of how to select slices of cross-sectional medical images in order to maximize performance of the resulting deep learning segmentation models. We conducted experiments on 4 medical imaging segmentation tasks with varying annotation budgets, numbers of annotated cases, numbers of annotated slices per volume, slice selection techniques, and mask interpolations. We found that: 1) It is almost always preferable to annotate fewer slices per volume and more volumes given an annotation budget. 2) Selecting slices for annotation by unsupervised active learning (UAL) is not superior to selecting slices randomly or at fixed intervals, provided that each volume is allocated the same number of annotated slices. 3) Interpolating masks between annotated slices rarely enhances model performance, with exceptions of some specific configuration for 3D models.
Abstract:Segment Anything Model (SAM) has shown impressive performance in interactive and zero-shot segmentation across diverse domains, suggesting that they have learned a general concept of "objects" from their large-scale training. However, we observed that SAM struggles with certain types of objects, particularly those featuring dense, tree-like structures and low textural contrast from their surroundings. These failure modes are critical for understanding its limitations in real-world use. In order to systematically examine this issue, we propose metrics to quantify two key object characteristics: tree-likeness and textural separability. Through extensive controlled synthetic experiments and testing on real datasets, we demonstrate that SAM's performance is noticeably correlated with these factors. We link these behaviors under the concept of "textural confusion", where SAM misinterprets local structure as global texture, leading to over-segmentation, or struggles to differentiate objects from similarly textured backgrounds. These findings offer the first quantitative framework to model SAM's challenges, providing valuable insights into its limitations and guiding future improvements for vision foundation models.
Abstract:Human behavioral patterns and consumption paradigms have emerged as pivotal determinants in environmental degradation and climate change, with quotidian decisions pertaining to transportation, energy utilization, and resource consumption collectively precipitating substantial ecological impacts. Recommender systems, which generate personalized suggestions based on user preferences and historical interaction data, exert considerable influence on individual behavioral trajectories. However, conventional recommender systems predominantly optimize for user engagement and economic metrics, inadvertently neglecting the environmental and societal ramifications of their recommendations, potentially catalyzing over-consumption and reinforcing unsustainable behavioral patterns. Given their instrumental role in shaping user decisions, there exists an imperative need for sustainable recommender systems that incorporate sustainability principles to foster eco-conscious and socially responsible choices. This comprehensive survey addresses this critical research gap by presenting a systematic analysis of sustainable recommender systems. As these systems can simultaneously advance multiple sustainability objectives--including resource conservation, sustainable consumer behavior, and social impact enhancement--examining their implementations across distinct application domains provides a more rigorous analytical framework. Through a methodological analysis of domain-specific implementations encompassing transportation, food, buildings, and auxiliary sectors, we can better elucidate how these systems holistically advance sustainability objectives while addressing sector-specific constraints and opportunities. Moreover, we delineate future research directions for evolving recommender systems beyond sustainability advocacy toward fostering environmental resilience and social consciousness in society.
Abstract:In this paper, we propose OpenSatMap, a fine-grained, high-resolution satellite dataset for large-scale map construction. Map construction is one of the foundations of the transportation industry, such as navigation and autonomous driving. Extracting road structures from satellite images is an efficient way to construct large-scale maps. However, existing satellite datasets provide only coarse semantic-level labels with a relatively low resolution (up to level 19), impeding the advancement of this field. In contrast, the proposed OpenSatMap (1) has fine-grained instance-level annotations; (2) consists of high-resolution images (level 20); (3) is currently the largest one of its kind; (4) collects data with high diversity. Moreover, OpenSatMap covers and aligns with the popular nuScenes dataset and Argoverse 2 dataset to potentially advance autonomous driving technologies. By publishing and maintaining the dataset, we provide a high-quality benchmark for satellite-based map construction and downstream tasks like autonomous driving.
Abstract:Recently, unmanned aerial vehicle (UAV) has attracted much attention due to its flexible deployment and controllable mobility. As the general communication network cannot meet the emergency requirements, in this paper we study the multi-UAV enabled wireless emergency communication system. Our goal is to maximize the capacity with jointly optimizing trajectory and allocating power. To tackle this non-convex optimization problem, we first decompose it into two sub-problems to optimize the trajectory and power allocation, respectively. Then, we propose the successive convex approximation technique and the block coordinate update algorithm to solve the two subproblems. The approximate optimal solution can be obtained after continuous iterations. Simulation results show that the capacity can be greatly increased using our proposed joint trajectory optimization and power allocation.
Abstract:Food recommendation systems serve as pivotal components in the realm of digital lifestyle services, designed to assist users in discovering recipes and food items that resonate with their unique dietary predilections. Typically, multi-modal descriptions offer an exhaustive profile for each recipe, thereby ensuring recommendations that are both personalized and accurate. Our preliminary investigation of two datasets indicates that pre-trained multi-modal dense representations might precipitate a deterioration in performance compared to ID features when encapsulating interactive relationships. This observation implies that ID features possess a relative superiority in modeling interactive collaborative signals. Consequently, contemporary cutting-edge methodologies augment ID features with multi-modal information as supplementary features, overlooking the latent semantic relations between recipes. To rectify this, we present CLUSSL, a novel food recommendation framework that employs clustering and self-supervised learning. Specifically, CLUSSL formulates a modality-specific graph tailored to each modality with discrete/continuous features, thereby transforming semantic features into structural representation. Furthermore, CLUSSL procures recipe representations pertinent to different modalities via graph convolutional operations. A self-supervised learning objective is proposed to foster independence between recipe representations derived from different unimodal graphs. Comprehensive experiments on real-world datasets substantiate that CLUSSL consistently surpasses state-of-the-art recommendation benchmarks in performance.
Abstract:LLMs have achieved significant performance progress in various NLP applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite. ClinicalLab includes ClinicalBench, an end-to-end multi-departmental clinical diagnostic evaluation benchmark for evaluating medical agents and LLMs. ClinicalBench is based on real cases that cover 24 departments and 150 diseases. ClinicalLab also includes four novel metrics (ClinicalMetrics) for evaluating the effectiveness of LLMs in clinical diagnostic tasks. We evaluate 17 LLMs and find that their performance varies significantly across different departments. Based on these findings, in ClinicalLab, we propose ClinicalAgent, an end-to-end clinical agent that aligns with real-world clinical diagnostic practices. We systematically investigate the performance and applicable scenarios of variants of ClinicalAgent on ClinicalBench. Our findings demonstrate the importance of aligning with modern medical practices in designing medical agents.