Abstract:The performance of sensor arrays in sensing and wireless communications improves with more elements, but this comes at the cost of increased energy consumption and hardware expense. This work addresses the challenge of selecting $k$ sensor elements from a set of $m$ to optimize a generic Quality-of-Service metric. Evaluating all $\binom{m}{k}$ possible sensor subsets is impractical, leading to prior solutions using convex relaxations, greedy algorithms, and supervised learning approaches. The current paper proposes a new framework that employs deep generative modeling, treating sensor selection as a deterministic Markov Decision Process where sensor subsets of size $k$ arise as terminal states. Generative Flow Networks (GFlowNets) are employed to model an action distribution conditioned on the state. Sampling actions from the aforementioned distribution ensures that the probability of arriving at a terminal state is proportional to the performance of the corresponding subset. Applied to a standard sensor selection scenario, the developed approach outperforms popular methods which are based on convex optimization and greedy algorithms. Finally, a multiobjective formulation of the proposed approach is adopted and applied on the sparse antenna array design for Integrated Sensing and Communication (ISAC) systems. The multiobjective variation is shown to perform well in managing the trade-off between radar and communication performance.
Abstract:The paper studies the problem of designing the Intelligent Reflecting Surface (IRS) phase shifters for Multiple Input Single Output (MISO) communication systems in spatiotemporally correlated channel environments, where the destination can move within a confined area. The objective is to maximize the expected sum of SNRs at the receiver over infinite time horizons. The problem formulation gives rise to a Markov Decision Process (MDP). We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion by constructing the state representation to include the current position of the receiver and the phase shift values and receiver positions that correspond to a window of previous time steps. The channel variability induces high frequency components on the spectrum of the underlying value function. We propose the preprocessing of the critic's input with a Fourier kernel which enables stable value learning. Finally, we investigate the use of the destination SNR as a component of the designed MDP state, which is common practice in previous work. We provide empirical evidence that, when the channels are spatiotemporally correlated, the inclusion of the SNR in the state representation interacts with function approximation in ways that inhibit convergence.