Abstract:Advancements in Multimodal Large Language Models (MLLMs) have significantly improved medical task performance, such as Visual Question Answering (VQA) and Report Generation (RG). However, the fairness of these models across diverse demographic groups remains underexplored, despite its importance in healthcare. This oversight is partly due to the lack of demographic diversity in existing medical multimodal datasets, which complicates the evaluation of fairness. In response, we propose FMBench, the first benchmark designed to evaluate the fairness of MLLMs performance across diverse demographic attributes. FMBench has the following key features: 1: It includes four demographic attributes: race, ethnicity, language, and gender, across two tasks, VQA and RG, under zero-shot settings. 2: Our VQA task is free-form, enhancing real-world applicability and mitigating the biases associated with predefined choices. 3: We utilize both lexical metrics and LLM-based metrics, aligned with clinical evaluations, to assess models not only for linguistic accuracy but also from a clinical perspective. Furthermore, we introduce a new metric, Fairness-Aware Performance (FAP), to evaluate how fairly MLLMs perform across various demographic attributes. We thoroughly evaluate the performance and fairness of eight state-of-the-art open-source MLLMs, including both general and medical MLLMs, ranging from 7B to 26B parameters on the proposed benchmark. We aim for FMBench to assist the research community in refining model evaluation and driving future advancements in the field. All data and code will be released upon acceptance.
Abstract:Movable antenna (MA) has emerged as a promising technology for improving the performance of wireless communication systems, which enables local movement of the antennas to create more favorable channel conditions. In this letter, we advance its application for over-the-air computation (AirComp) network, where an access point is equipped with a two-dimensional (2D) MA array to aggregate wireless data from massive users. We aim to minimize the computation mean square error (CMSE) by jointly optimizing the antenna position vector (APV), the receive combining vector at the access point and the transmit coefficients from all users. To tackle this highly non-convex problem, we propose a two-loop iterative algorithm, where the particle swarm optimization (PSO) approach is leveraged to obtain a suboptimal APV in the outer loop while the receive combining vector and transmit coefficients are alternately optimized in the inner loop. Numerical results demonstrate that the proposed MA-enhanced AirComp network outperforms the conventional network with fixed-position antennas (FPAs).
Abstract:The movable antenna (MA) technology has attracted increasing attention in wireless communications due to its capability for flexibly adjusting the positions of multiple antennas in a local region to reconfigure channel conditions. In this paper, we investigate its application in an amplify-and-forward (AF) relay system, where a multi-MA AF relay is deployed to assist in the wireless communications from a source to a destination. In particular, we aim to maximize the achievable rate at the destination, by jointly optimizing the AF weight matrix at the relay and its MAs' positions in two stages for receiving the signal from the source and transmitting its amplified version to the destination, respectively. However, compared to the existing one-stage antenna position optimization, the two-stage position optimization is more challenging due to its intricate coupling in the achievable rate at the destination. To tackle this challenge, we decompose the considered problem into several subproblems by invoking the alternating optimization (AO) and solve them by using the semidefinite programming and the gradient ascent. Numerical results demonstrate the superiority of our proposed system over the conventional relaying system with fixed-position antennas (FPAs) and also drive essential insights.
Abstract:Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation techniques. In our work, we address the challenge of enhancing model performance despite the inherent limitations of low-quality optical flow. Our methodology employs a three-pronged approach: extracting boundaries directly from the optical flow, selectively discarding frames with inferior flow quality, and employing a fine-tuning process with variable frame rates. We thoroughly evaluate our strategy on the EndoVis2017 VOS dataset and Endovis2017 Challenge dataset, where our model demonstrates promising results, achieving a mean Intersection-over-Union (mIoU) of 0.75 and 0.72, respectively. Our findings suggest that our approach can greatly decrease the need for manual annotations in clinical environments and may facilitate the annotation process for new datasets. The code is available at https://github.com/wpr1018001/Rethinking-Low-quality-Optical-Flow.git
Abstract:Unmanned aerial vehicle (UAV) communications have been widely accepted as promising technologies to support air-to-ground communications in the forthcoming sixth-generation (6G) wireless networks. This paper proposes a novel air-to-ground communication model consisting of aerial base stations served by UAVs and terrestrial user equipments (UEs) by integrating the technique of coordinated multi-point (CoMP) transmission with the theory of stochastic geometry. In particular, a CoMP set consisting of multiple UAVs is developed based on the theory of Poisson-Delaunay tetrahedralization. Effective UAV formation control and UAV swarm tracking schemes for two typical scenarios, including static and mobile UEs, are also developed using the multi-agent system theory to ensure that collaborative UAVs can efficiently reach target spatial positions for mission execution. Thanks to the ease of mathematical tractability, this model provides explicit performance expressions for a typical UE's coverage probability and achievable ergodic rate. Extensive simulation and numerical results corroborate that the proposed scheme outperforms UAV communications without CoMP transmission and obtains similar performance to the conventional CoMP scheme while avoiding search overhead.
Abstract:In the Internet of Things (IoT) networks, edge learning for data-driven tasks provides intelligent applications and services. As the network size becomes large, different users may generate distinct datasets. Thus, to suit multiple edge learning tasks for large-scale IoT networks, this paper performs efficient communication under the task-oriented principle by using the collaborative design of wireless resource allocation and edge learning error prediction. In particular, we start with multi-user scheduling to alleviate co-channel interference in dense networks. Then, we perform optimal power allocation in parallel for different learning tasks. Thanks to the high parallelization of the designed algorithm, extensive experimental results corroborate that the multi-user scheduling and task-oriented power allocation improve the performance of distinct edge learning tasks efficiently compared with the state-of-the-art benchmark algorithms.
Abstract:Efficient channel estimation is challenging in full-dimensional multiple-input multiple-output communication systems, particularly in those with hybrid digital-analog architectures. Under a compressive sensing framework, this letter first designs a uniform dictionary based on a spherical Fibonacci grid to represent channels in a sparse domain, yielding smaller angular errors in three-dimensional beamspace than traditional dictionaries. Then, a Bayesian inference-aided greedy pursuit algorithm is developed to estimate channels in the frequency domain. Finally, simulation results demonstrate that both the designed dictionary and the proposed Bayesian channel estimation outperform the benchmark schemes and attain a lower normalized mean squared error of channel estimation.
Abstract:As a forerunner in 5G technologies, Narrowband Internet of Things (NB-IoT) will be inevitably coexisting with the legacy Long-Term Evolution (LTE) system. Thus, it is imperative for NB-IoT to mitigate LTE interference. By virtue of the strong temporal correlation of the NB-IoT signal, this letter develops a sparsity adaptive algorithm to recover the NB-IoT signal from legacy LTE interference, by combining $K$-means clustering and sparsity adaptive matching pursuit (SAMP). In particular, the support of the NB-IoT signal is first estimated coarsely by $K$-means clustering and SAMP algorithm without sparsity limitation. Then, the estimated support is refined by a repeat mechanism. Simulation results demonstrate the effectiveness of the developed algorithm in terms of recovery probability and bit error rate, compared with competing algorithms.
Abstract:Wireless powered backscatter communications (WPBC) is capable of implementing ultra-low-power communication, thus promising in the Internet of Things (IoT) networks. In practice, however, it is challenging to apply WPBC in large-scale IoT networks because of its short communication range. To address this challenge, this paper exploits an unmanned ground vehicle (UGV) to assist WPBC in large-scale IoT networks. In particular, we investigate the joint design of network planning and dynamic resource allocation of the access point (AP), tag reader, and UGV to minimize the total energy consumption. Also, the AP can operate in either half-duplex (HD) or full-duplex (FD) multiplexing mode. Under HD mode, the optimal cell radius is derived and the optimal power allocation and transmit/receive beamforming are obtained in closed form. Under FD mode, the optimal resource allocation, as well as two suboptimal ones with low computational complexity, is developed. Simulation results disclose that dynamic power allocation at the tag reader rather than at the AP dominates the network energy efficiency while the AP operating in FD mode outperforms that in HD mode concerning energy efficienc