Abstract:Multi-modal Large Language Models (MLLMs) have exhibited impressive capability. However, recently many deficiencies of MLLMs have been found compared to human intelligence, $\textit{e.g.}$, hallucination. To drive the MLLMs study, the community dedicated efforts to building larger benchmarks with complex tasks. In this paper, we propose benchmarking an essential but usually overlooked intelligence: $\textbf{association}$, a human's basic capability to link observation and prior practice memory. To comprehensively investigate MLLM's performance on the association, we formulate the association task and devise a standard benchmark based on adjective and verb semantic concepts. Instead of costly data annotation and curation, we propose a convenient $\textbf{annotation-free}$ construction method transforming the general dataset for our association tasks. Simultaneously, we devise a rigorous data refinement process to eliminate confusion in the raw dataset. Building on this database, we establish three levels of association tasks: single-step, synchronous, and asynchronous associations. Moreover, we conduct a comprehensive investigation into the MLLMs' zero-shot association capabilities, addressing multiple dimensions, including three distinct memory strategies, both open-source and closed-source MLLMs, cutting-edge Mixture-of-Experts (MoE) models, and the involvement of human experts. Our systematic investigation shows that current open-source MLLMs consistently exhibit poor capability in our association tasks, even the currently state-of-the-art GPT-4V(vision) also has a significant gap compared to humans. We believe our benchmark would pave the way for future MLLM studies. $\textit{Our data and code are available at:}$ https://mvig-rhos.com/llm_inception.
Abstract:Reading ability detection is important in modern educational field. In this paper, a method of predicting scores of reading ability is proposed, using the eye-tracking data of a few subjects (e.g., 68 subjects). The proposed method built a regression model for the score prediction by combining Long Short Time Memory (LSTM) and light-weighted neural networks. Experiments show that with few-shot learning strategy, the proposed method achieved higher accuracy than previous methods of score prediction in reading ability detection. The code can later be downloaded at https://github.com/pumpkinLNX/LSTM-eye-tracking-pytorch.git
Abstract:A novel multicast communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the transmission rate. Specifically, an MA-assisted two-user multicast multiple-input single-input system is considered. The joint optimization of the transmit beamforming vector and transmit MA positions is studied by modeling the motion of the MA elements as discrete movements. A low-complexity greedy search-based algorithm is proposed to tackle this non-convex inter-programming problem. A branch-and-bound (BAB)-based method is proposed to achieve the optimal multicast rate with a reduced time complexity than the brute-force search by assuming the two users suffer similar line-of-sight path losses. Numerical results reveal that the proposed MA systems significantly improve the multicast rate compared to conventional fixed-position antennas (FPAs)-based systems.
Abstract:A pioneering secure transmission scheme is proposed, which harnesses movable antennas (MAs) to optimize antenna positions for augmenting the physical layer security. Particularly, an MA-enabled secure wireless system is considered, where a multi-antenna transmitter communicates with a single-antenna receiver in the presence of an eavesdropper. The beamformer and antenna positions at the transmitter are jointly optimized under two criteria: power consumption minimization and secrecy rate maximization. For each scenario, a novel suboptimal algorithm was proposed to tackle the resulting nonconvex optimization problem, capitalizing on the approaches of alternating optimization and gradient descent. Numerical results demonstrate that the proposed MA systems significantly improve physical layer security compared to various benchmark schemes relying on conventional fixed-position antennas (FPAs).
Abstract:Active reconfigurable intelligent surface (ARIS) is a newly emerging RIS technique that leverages radio frequency (RF) reflection amplifiers to empower phase-configurable reflection elements (REs) in amplifying the incident signal. Thereby, ARIS can enhance wireless communications with the strengthened ARIS-aided links. In this letter, we propose exploiting the signal amplification capability of ARIS for channel estimation, aiming to improve the estimation precision. Nevertheless, the signal amplification inevitably introduces the thermal noise at the ARIS, which can hinder the acquisition of accurate channel state information (CSI) with conventional channel estimation methods based on passive RIS (PRIS). To address this issue, we further investigate this ARIS-specific channel estimation problem and propose a least-square (LS) based channel estimator, whose performance can be further improved with the design on ARIS reflection patterns at the channel training phase. Based on the proposed LS channel estimator, we optimize the training reflection patterns to minimize the channel estimation error variance. Extensive simulation results show that our proposed design can achieve accurate channel estimation in the presence of the ARIS noises.
Abstract:A novel multiuser communication system with movable antennas (MAs) is proposed, where the antenna position optimization is exploited to enhance the downlink sum-rate. The joint optimization of the transmit beamforming vector and transmit MA positions is studied for a multiuser multiple-input single-input system. An efficient algorithm is proposed to tackle the formulated non-convex problem via capitalizing on fractional programming, alternating optimization, and gradient descent methods. To strike a better performance-complexity trade-off, a zero-forcing beamforming-based design is also proposed as an alternative. Numerical investigations are presented to verify the efficiency of the proposed algorithms and their superior performance compared with the benchmark relying on conventional fixed-position antennas (FPAs).
Abstract:A novel over-the-air computation (AirComp) framework, empowered by the incorporation of movable antennas (MAs), is proposed to significantly enhance computation accuracy. Within this framework, the joint optimization of transmit power control, antenna positioning, and receive combining is investigated. An efficient method is proposed to tackle the problem of computation mean-squared error (MSE) minimization, capitalizing on the approach of alternating optimization. Numerical results are provided to substantiate the superior MSE performance of the proposed framework, which establish its clear advantage over benchmark systems employing conventional fixed-position antennas (FPAs).