Sherman
Abstract:Well-designed prompts are crucial for enhancing Large language models' (LLMs) reasoning capabilities while aligning their outputs with task requirements across diverse domains. However, manually designed prompts require expertise and iterative experimentation. While existing prompt optimization methods aim to automate this process, they rely heavily on external references such as ground truth or by humans, limiting their applicability in real-world scenarios where such data is unavailable or costly to obtain. To address this, we propose Self-Supervised Prompt Optimization (SPO), a cost-efficient framework that discovers effective prompts for both closed and open-ended tasks without requiring external reference. Motivated by the observations that prompt quality manifests directly in LLM outputs and LLMs can effectively assess adherence to task requirements, we derive evaluation and optimization signals purely from output comparisons. Specifically, SPO selects superior prompts through pairwise output comparisons evaluated by an LLM evaluator, followed by an LLM optimizer that aligns outputs with task requirements. Extensive experiments demonstrate that SPO outperforms state-of-the-art prompt optimization methods, achieving comparable or superior results with significantly lower costs (e.g., 1.1% to 5.6% of existing methods) and fewer samples (e.g., three samples). The code is available at https://github.com/geekan/MetaGPT.
Abstract:With the advancement of large language models (LLMs), an increasing number of student models have leveraged LLMs to analyze textual artifacts generated by students to understand and evaluate their learning. These student models typically employ pre-trained LLMs to vectorize text inputs into embeddings and then use the embeddings to train models to detect the presence or absence of a construct of interest. However, how reliable and robust are these models at processing language with different levels of complexity? In the context of learning where students may have different language backgrounds with various levels of writing skills, it is critical to examine the robustness of such models to ensure that these models work equally well for text with varying levels of language complexity. Coincidentally, a few (but limited) research studies show that the use of language can indeed impact the performance of LLMs. As such, in the current study, we examined the robustness of several LLM-based student models that detect student self-regulated learning (SRL) in math problem-solving. Specifically, we compared how the performance of these models vary using texts with high and low lexical, syntactic, and semantic complexity measured by three linguistic measures.
Abstract:Rainfall prediction remains a persistent challenge due to the highly nonlinear and complex nature of meteorological data. Existing approaches lack systematic utilization of grid search for optimal hyperparameter tuning, relying instead on heuristic or manual selection, frequently resulting in sub-optimal results. Additionally, these methods rarely incorporate newly constructed meteorological features such as differences between temperature and humidity to capture critical weather dynamics. Furthermore, there is a lack of systematic evaluation of ensemble learning techniques and limited exploration of diverse advanced models introduced in the past one or two years. To address these limitations, we propose a robust ensemble learning grid search-tuned framework (RAINER) for rainfall prediction. RAINER incorporates a comprehensive feature engineering pipeline, including outlier removal, imputation of missing values, feature reconstruction, and dimensionality reduction via Principal Component Analysis (PCA). The framework integrates novel meteorological features to capture dynamic weather patterns and systematically evaluates non-learning mathematical-based methods and a variety of machine learning models, from weak classifiers to advanced neural networks such as Kolmogorov-Arnold Networks (KAN). By leveraging grid search for hyperparameter tuning and ensemble voting techniques, RAINER achieves promising results within real-world datasets.
Abstract:Reconfigurable intelligent surfaces (RIS)-assisted cell-free massive multiple-input multiple-output (CF mMIMO) systems have emerged as a promising technology for sixth-generation communication systems. These systems capitalize on RIS to minimize power consumption, thereby achieving consistent performance and enhancing communication quality through the establishment and shaping of auxiliary signal propagation pathways between access points (APs) and users. However, integrating RIS into existing CF mMIMO infrastructures presents several technical challenges. This study delves into the signal transmission scheme and deployment architecture of RIS-aided CF mMIMO systems, addressing inherent challenges such as interference induced by RIS and the increased complexity in beam alignment. Furthermore, we address the complexities arising from the joint optimization of the reflection phase of RIS and beamforming technology at the APs, intending to fully exploit the reflection capabilities of RISs and beamforming technology to maximize the energy efficiency (EE) of the system. To overcome these challenges, we propose cooperation communication to suppress RIS-induced interference, beam tracking, and joint optimization to improve system EE. We also present specific examples of cooperative communication under the constraint of electromagnetic interference and the beam tracking of a mobile system. Additionally, we emphasize important research directions for RIS-aided CF mMIMO systems, aiming to inspire future investigations.
Abstract:The key technologies of sixth generation (6G), such as ultra-massive multiple-input multiple-output (MIMO), enable intricate interactions between antennas and wireless propagation environments. As a result, it becomes necessary to develop joint models that encompass both antennas and wireless propagation channels. To achieve this, we utilize the multi-port communication theory, which considers impedance matching among the source, transmission medium, and load to facilitate efficient power transfer. Specifically, we first investigate the impact of insertion loss, mutual coupling, and other factors on the performance of multi-port matching networks. Next, to further improve system performance, we explore two important deep unfolding designs for the multi-port matching networks: beamforming and power control, respectively. For the hybrid beamforming, we develop a deep unfolding framework, i.e., projected gradient descent (PGD)-Net based on unfolding projected gradient descent. For the power control, we design a deep unfolding network, graph neural network (GNN) aided alternating optimization (AO)Net, which considers the interaction between different ports in optimizing power allocation. Numerical results verify the necessity of considering insertion loss in the dynamic metasurface antenna (DMA) performance analysis. Besides, the proposed PGD-Net based hybrid beamforming approaches approximate the conventional model-based algorithm with very low complexity. Moreover, our proposed power control scheme has a fast run time compared to the traditional weighted minimum mean squared error (WMMSE) method.
Abstract:The rotary and movable antennas (ROMA) technology is efficient in enhancing wireless network capacity by adjusting both the antenna spacing and three-dimensional (3D) rotation of antenna surfaces, based on the spatial distribution of users and channel statistics. Applying ROMA to high-speed rail (HSR) wireless communications can significantly improve system performance in terms of array gain and spatial multiplexing. However, the rapidly changing channel conditions in HSR scenarios present challenges for ROMA configuration. In this correspondence, we propose a analytical framework for configuring ROMA-based extremely large-scale multiple-input-multiple-output (XL-MIMO) system in HSR scenarios based on spatial correlation. First, we develop a localization model based on a mobility-aware near-field beam training algorithm to determine the real-time position of the train relay antennas. Next, we derive the expression for channel orthogonality and antenna spacing based on the spatial correlation matrix, and obtain the optimal antenna spacing when the transceiver panels are aligned in parallel. Moreover, we propose an optimization algorithm for the rotation angle of the transceiver panels, leveraging the differential evolution method, to determine the optimal angle. Finally, numerical results are provided to validate the computational results and optimization algorithm.
Abstract:Cell-free massive multiple-input multiple-output (mMIMO) offers significant advantages in mobility scenarios, mainly due to the elimination of cell boundaries and strong macro diversity. In this paper, we examine the downlink performance of cell-free mMIMO systems equipped with mobile-APs utilizing the concept of unmanned aerial vehicles, where mobility and power control are jointly considered to effectively enhance coverage and suppress interference. However, the high computational complexity, poor collaboration, limited scalability, and uneven reward distribution of conventional optimization schemes lead to serious performance degradation and instability. These factors complicate the provision of consistent and high-quality service across all user equipments in downlink cell-free mMIMO systems. Consequently, we propose a novel scalable framework enhanced by multi-agent reinforcement learning (MARL) to tackle these challenges. The established framework incorporates a graph neural network (GNN)-aided communication mechanism to facilitate effective collaboration among agents, a permutation architecture to improve scalability, and a directional decoupling architecture to accurately distinguish contributions. In the numerical results, we present comparisons of different optimization schemes and network architectures, which reveal that the proposed scheme can effectively enhance system performance compared to conventional schemes due to the adoption of advanced technologies. In particular, appropriately compressing the observation space of agents is beneficial for achieving a better balance between performance and convergence.
Abstract:Extremely large-scale multiple-input multiple-output (XL-MIMO) is gaining attention as a prominent technology for enabling the sixth-generation (6G) wireless networks. However, the vast antenna array and the huge bandwidth introduce a non-negligible beam squint effect, causing beams of different frequencies to focus at different locations. One approach to cope with this is to employ true-time-delay lines (TTDs)-based beamforming to control the range and trajectory of near-field beam squint, known as the near-field controllable beam squint (CBS) effect. In this paper, we investigate the user localization in near-field wideband XL-MIMO systems under the beam squint effect and spatial non-stationary properties. Firstly, we derive the expressions for Cram\'er-Rao Bounds (CRBs) for characterizing the performance of estimating both angle and distance. This analysis aims to assess the potential of leveraging CBS for precise user localization. Secondly, a user localization scheme combining CBS and beam training is proposed. Specifically, we organize multiple subcarriers into groups, directing beams from different groups to distinct angles or distances through the CBS to obtain the estimates of users' angles and distances. Furthermore, we design a user localization scheme based on a convolutional neural network model, namely ConvNeXt. This scheme utilizes the inputs and outputs of the CBS-based scheme to generate high-precision estimates of angle and distance. More importantly, our proposed ConvNeXt-based user localization scheme achieves centimeter-level accuracy in localization estimates.
Abstract:Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS) are two advanced transceiver technologies for realizing future sixth-generation (6G) networks. In this paper, we investigate the joint precoding and access point (AP) selection for energy efficient RIS-aided CF mMIMO system. To address the associated computational complexity and communication power consumption, we advocate for user-centric dynamic networks in which each user is served by a subset of APs rather than by all of them. Based on the user-centric network, we formulate a joint precoding and AP selection problem to maximize the energy efficiency (EE) of the considered system. To solve this complex nonconvex problem, we propose an innovative double-layer multi-agent reinforcement learning (MARL)-based scheme. Moreover, we propose an adaptive power threshold-based AP selection scheme to further enhance the EE of the considered system. To reduce the computational complexity of the RIS-aided CF mMIMO system, we introduce a fuzzy logic (FL) strategy into the MARL scheme to accelerate convergence. The simulation results show that the proposed FL-based MARL cooperative architecture effectively improves EE performance, offering a 85\% enhancement over the zero-forcing (ZF) method, and achieves faster convergence speed compared with MARL. It is important to note that increasing the transmission power of the APs or the number of RIS elements can effectively enhance the spectral efficiency (SE) performance, which also leads to an increase in power consumption, resulting in a non-trivial trade-off between the quality of service and EE performance.
Abstract:Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFlow enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code will be available at https://github.com/geekan/MetaGPT.