Abstract:Intelligent reflecting surface (IRS) and movable antenna (MA)/fluid antenna (FA) techniques have both received increasing attention in the realm of wireless communications due to their ability to reconfigure and improve wireless channel conditions. In this paper, we investigate the integration of MAs/FAs into an IRS-assisted wireless communication system. In particular, we consider the downlink transmission from a multi-MA base station (BS) to a single-antenna user with the aid of an IRS, aiming to maximize the user's received signal-to-noise ratio (SNR), by jointly optimizing the BS/IRS active/passive beamforming and the MAs' positions. Due to the similar capability of MAs and IRS for channel reconfiguration, we first conduct theoretical analyses of the performance gain of MAs over conventional fixed-position antennas (FPAs) under the line-of-sight (LoS) BS-IRS channel and derive the conditions under which the performance gain becomes more or less significant. Next, to solve the received SNR maximization problem, we propose an alternating optimization (AO) algorithm that decomposes it into two subproblems and solve them alternately. Numerical results are provided to validate our analytical results and evaluate the performance gains of MAs over FPAs under different setups.
Abstract:Fluid antennas (FAs) and mobile antennas (MAs) are innovative technologies in wireless communications that are able to proactively improve channel conditions by dynamically adjusting the transmit/receive antenna positions within a given spatial region. In this paper, we investigate an MA-enhanced multiple-input single-output (MISO) secure communication system, aiming to maximize the secrecy rate by jointly optimizing the positions of multiple MAs. Instead of continuously searching for the optimal MA positions as in prior works, we propose to discretize the transmit region into multiple sampling points, thereby converting the continuous antenna position optimization into a discrete sampling point selection problem. However, this point selection problem is combinatory and thus difficult to be optimally solved. To tackle this challenge, we ingeniously transform this combinatory problem into a recursive path selection problem in graph theory and propose a partial enumeration algorithm to obtain its optimal solution without the need for high-complexity exhaustive search. To further reduce the complexity, a linear-time sequential update algorithm is also proposed to obtain a high-quality suboptimal solution. Numerical results show that our proposed algorithms yield much higher secrecy rates as compared to the conventional FPA and other baseline schemes.
Abstract:Forensic pathology is critical in determining the cause and manner of death through post-mortem examinations, both macroscopic and microscopic. The field, however, grapples with issues such as outcome variability, laborious processes, and a scarcity of trained professionals. This paper presents SongCi, an innovative visual-language model (VLM) designed specifically for forensic pathology. SongCi utilizes advanced prototypical cross-modal self-supervised contrastive learning to enhance the accuracy, efficiency, and generalizability of forensic analyses. It was pre-trained and evaluated on a comprehensive multi-center dataset, which includes over 16 million high-resolution image patches, 2,228 vision-language pairs of post-mortem whole slide images (WSIs), and corresponding gross key findings, along with 471 distinct diagnostic outcomes. Our findings indicate that SongCi surpasses existing multi-modal AI models in many forensic pathology tasks, performs comparably to experienced forensic pathologists and significantly better than less experienced ones, and provides detailed multi-modal explainability, offering critical assistance in forensic investigations. To the best of our knowledge, SongCi is the first VLM specifically developed for forensic pathological analysis and the first large-vocabulary computational pathology (CPath) model that directly processes gigapixel WSIs in forensic science.
Abstract:Fluid antennas (FAs) and movable antennas (MAs) have drawn increasing attention in wireless communications recently due to their ability to create favorable channel conditions via local antenna movement within a confined region. In this letter, we advance their application for cognitive radio to facilitate efficient spectrum sharing between primary and secondary communication systems. In particular, we aim to jointly optimize the transmit beamforming and MA positions at a secondary transmitter (ST) to maximize the received signal power at a secondary receiver (SR) subject to the constraints on its imposed co-channel interference power with multiple primary receivers (PRs). However, such an optimization problem is difficult to be optimally solved due to the highly nonlinear functions of the received signal/interference power at the SR/all PRs in terms of the MA positions. To drive useful insights, we first perform theoretical analyses to unveil MAs' capability to achieve maximum-ratio transmission with the SR and effective interference mitigation for all PRs at the same time. To solve the MA position optimization problem, we propose an alternating optimization (AO) algorithm to obtain a high-quality suboptimal solution. Numerical results demonstrate that our proposed algorithms can significantly outperform the conventional fixed-position antennas (FPAs) and other baseline schemes.
Abstract:Fluid antennas (FAs) and movable antennas (MAs) have emerged as promising technologies in wireless communications, which offer the flexibility to improve channel conditions by adjusting transmit/receive antenna positions within a spatial region. In this letter, we focus on an MA-enhanced multiple-input single-output (MISO) communication system, aiming to optimize the positions of multiple transmit MAs to maximize the received signal power. Unlike the prior works on continuously searching for the optimal MA positions, we propose to sample the transmit region into discrete points, such that the continuous antenna position optimization problem is transformed to a discrete sampling point selection problem based on the point-wise channel information. However, such a point selection problem is combinatory and challenging to be optimally solved. To tackle this challenge, we ingeniously recast it as an equivalent fixed-hop shortest path problem in graph theory and propose a customized algorithm to solve it optimally in polynomial time. To further reduce the complexity, a linear-time sequential update algorithm is also proposed to obtain a high-quality suboptimal solution. Numerical results demonstrate that the proposed algorithms can yield considerable performance gains over the conventional fixed-position antennas with/without antenna selection.
Abstract:The congruence between affective experiences and physiological changes has been a debated topic for centuries. Recent technological advances in measurement and data analysis provide hope to solve this epic challenge. Open science and open data practices, together with data analysis challenges open to the academic community, are also promising tools for solving this problem. In this entry to the Emotion Physiology and Experience Collaboration (EPiC) challenge, we propose a data analysis solution that combines theoretical assumptions with data-driven methodologies. We used feature engineering and ensemble selection. Each predictor was trained on subsets of the training data that would maximize the information available for training. Late fusion was used with an averaging step. We chose to average considering a ``wisdom of crowds'' strategy. This strategy yielded an overall RMSE of 1.19 in the test set. Future work should carefully explore if our assumptions are correct and the potential of weighted fusion.
Abstract:Recent advances in computer vision (CV) and natural language processing have been driven by exploiting big data on practical applications. However, these research fields are still limited by the sheer volume, versatility, and diversity of the available datasets. CV tasks, such as image captioning, which has primarily been carried out on natural images, still struggle to produce accurate and meaningful captions on sketched images often included in scientific and technical documents. The advancement of other tasks such as 3D reconstruction from 2D images requires larger datasets with multiple viewpoints. We introduce DeepPatent2, a large-scale dataset, providing more than 2.7 million technical drawings with 132,890 object names and 22,394 viewpoints extracted from 14 years of US design patent documents. We demonstrate the usefulness of DeepPatent2 with conceptual captioning. We further provide the potential usefulness of our dataset to facilitate other research areas such as 3D image reconstruction and image retrieval.
Abstract:Accurate acne detection plays a crucial role in acquiring precise diagnosis and conducting proper therapy. However, the ambiguous boundaries and arbitrary dimensions of acne lesions severely limit the performance of existing methods. In this paper, we address these challenges via a novel Decoupled Sequential Detection Head (DSDH), which can be easily adopted by mainstream two-stage detectors. DSDH brings two simple but effective improvements to acne detection. Firstly, the offset and scaling tasks are explicitly introduced, and their incompatibility is settled by our task-decouple mechanism, which improves the capability of predicting the location and size of acne lesions. Second, we propose the task-sequence mechanism, and execute offset and scaling sequentially to gain a more comprehensive insight into the dimensions of acne lesions. In addition, we build a high-quality acne detection dataset named ACNE-DET to verify the effectiveness of DSDH. Experiments on ACNE-DET and the public benchmark ACNE04 show that our method outperforms the state-of-the-art methods by significant margins. Our code and dataset are publicly available at (temporarily anonymous).
Abstract:Long-term vertebral fractures severely affect the life quality of patients, causing kyphotic, lumbar deformity and even paralysis. Computed tomography (CT) is a common clinical examination to screen for this disease at early stages. However, the faint radiological appearances and unspecific symptoms lead to a high risk of missed diagnosis. In particular, the mild fractures and normal controls are quite difficult to distinguish for deep learning models and inexperienced doctors. In this paper, we argue that reinforcing the faint fracture features to encourage the inter-class separability is the key to improving the accuracy. Motivated by this, we propose a supervised contrastive learning based model to estimate Genent's Grade of vertebral fracture with CT scans. The supervised contrastive learning, as an auxiliary task, narrows the distance of features within the same class while pushing others away, which enhances the model's capability of capturing subtle features of vertebral fractures. Considering the lack of datasets in this field, we construct a database including 208 samples annotated by experienced radiologists. Our method has a specificity of 99\% and a sensitivity of 85\% in binary classification, and a macio-F1 of 77\% in multi-classification, indicating that contrastive learning significantly improves the accuracy of vertebrae fracture screening, especially for the mild fractures and normal controls. Our desensitized data and codes will be made publicly available for the community.
Abstract:Computed tomography (CT) samples with pathological annotations are difficult to obtain. As a result, the computer-aided diagnosis (CAD) algorithms are trained on small datasets (e.g., LIDC-IDRI with 1,018 samples), limiting their accuracies and reliability. In the past five years, several works have tailored for unsupervised representations of CT lesions via two-dimensional (2D) and three-dimensional (3D) self-supervised learning (SSL) algorithms. The 2D algorithms have difficulty capturing 3D information, and existing 3D algorithms are computationally heavy. Light-weight 3D SSL remains the boundary to explore. In this paper, we propose the spiral contrastive learning (SCL), which yields 3D representations in a computationally efficient manner. SCL first transforms 3D lesions to the 2D plane using an information-preserving spiral transformation, and then learn transformation-invariant features using 2D contrastive learning. For the augmentation, we consider natural image augmentations and medical image augmentations. We evaluate SCL by training a classification head upon the embedding layer. Experimental results show that SCL achieves state-of-the-art accuracy on LIDC-IDRI (89.72%), LNDb (82.09%) and TianChi (90.16%) for unsupervised representation learning. With 10% annotated data for fine-tune, the performance of SCL is comparable to that of supervised learning algorithms (85.75% vs. 85.03% on LIDC-IDRI, 78.20% vs. 73.44% on LNDb and 87.85% vs. 83.34% on TianChi, respectively). Meanwhile, SCL reduces the computational effort by 66.98% compared to other 3D SSL algorithms, demonstrating the efficiency of the proposed method in unsupervised pre-training.