Abstract:A multi-access wireless network with N transmitting nodes, each equipped with an energy harvesting (EH) device and a rechargeable battery of finite capacity, is studied. At each time slot (TS) a node is operative with a certain probability, which may depend on the availability of data, or the state of its channel. The energy arrival process at each node is modelled as an independent two-state Markov process, such that, at each TS, a node either harvests one unit of energy, or none. At each TS a subset of the nodes is scheduled by the access point (AP). The scheduling policy that maximises the total throughput is studied assuming that the AP does not know the states of either the EH processes or the batteries. The problem is identified as a restless multiarmed bandit (RMAB) problem, and an upper bound on the optimal scheduling policy is found. Under certain assumptions regarding the EH processes and the battery sizes, the optimality of the myopic policy (MP) is proven. For the general case, the performance of MP is compared numerically to the upper bound.
Abstract:We consider content-level selective offloading of cellular downlink traffic to a wireless infostation terminal which stores high data-rate content in its cache memory. Cellular users in the vicinity of the infostation can directly download the stored content from the infostation through a broadband connection (e.g., WiFi), reducing the latency and load on the cellular network. The goal of the infostation cache controller (CC) is to store the most popular content in the cache memory such that the maximum amount of traffic is offloaded to the infostation. In practice, the popularity profile of the files is not known by the CC, which observes only the instantaneous demands for those contents stored in the cache. Hence, the cache content placement is optimised based on the demand history and on the cost associated to placing each content in the cache. By refreshing the cache content at regular time intervals, the CC gradually learns the popularity profile, while at the same time exploiting the limited cache capacity in the best way possible. This is formulated as a multi-armed bandit (MAB) problem with switching cost. Several algorithms are presented to decide on the cache content over time. The performance is measured in terms of cache efficiency, defined as the amount of net traffic that is offloaded to the infostation. In addition to theoretical regret bounds, the proposed algorithms are analysed through numerical simulations. In particular, the impact of system parameters, such as the number of files, number of users, cache size, and skewness of the popularity profile, on the performance is studied numerically. It is shown that the proposed algorithms learn the popularity profile quickly for a wide range of system parameters.
Abstract:A point-to-point wireless communication system in which the transmitter is equipped with an energy harvesting device and a rechargeable battery, is studied. Both the energy and the data arrivals at the transmitter are modeled as Markov processes. Delay-limited communication is considered assuming that the underlying channel is block fading with memory, and the instantaneous channel state information is available at both the transmitter and the receiver. The expected total transmitted data during the transmitter's activation time is maximized under three different sets of assumptions regarding the information available at the transmitter about the underlying stochastic processes. A learning theoretic approach is introduced, which does not assume any a priori information on the Markov processes governing the communication system. In addition, online and offline optimization problems are studied for the same setting. Full statistical knowledge and causal information on the realizations of the underlying stochastic processes are assumed in the online optimization problem, while the offline optimization problem assumes non-causal knowledge of the realizations in advance. Comparing the optimal solutions in all three frameworks, the performance loss due to the lack of the transmitter's information regarding the behaviors of the underlying Markov processes is quantified.