Abstract:Collective decision-making is crucial to information and communication systems. Decision conflicts among agents hinder the maximization of potential utilities of the entire system. Quantum processes can realize conflict-free joint decisions among two agents using the entanglement of photons or quantum interference of orbital angular momentum (OAM). However, previous studies have always presented symmetric resultant joint decisions. Although this property helps maintain and preserve equality, it cannot resolve disparities. Global challenges, such as ethics and equity, are recognized in the field of responsible artificial intelligence as responsible research and innovation paradigm. Thus, decision-making systems must not only preserve existing equality but also tackle disparities. This study theoretically and numerically investigates asymmetric collective decision-making using quantum interference of photons carrying OAM or entangled photons. Although asymmetry is successfully realized, a photon loss is inevitable in the proposed models. The available range of asymmetry and method for obtaining the desired degree of asymmetry are analytically formulated.
Abstract:Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents.
Abstract:Collective decision-making is vital for recent information and communications technologies. In our previous research, we mathematically derived conflict-free joint decision-making that optimally satisfies players' probabilistic preference profiles. However, two problems exist regarding the optimal joint decision-making method. First, as the number of choices increases, the computational cost of calculating the optimal joint selection probability matrix explodes. Second, to derive the optimal joint selection probability matrix, all players must disclose their probabilistic preferences. Now, it is noteworthy that explicit calculation of the joint probability distribution is not necessarily needed; what is necessary for collective decisions is sampling. This study examines several sampling methods that converge to heuristic joint selection probability matrices that satisfy players' preferences. We show that they can significantly reduce the above problems of computational cost and confidentiality. We analyze the probability distribution each of the sampling methods converges to, as well as the computational cost required and the confidentiality secured. In particular, we introduce two conflict-free joint sampling methods through quantum interference of photons. The first system allows the players to hide their choices while satisfying the players' preferences almost perfectly when they have the same preferences. The second system, where the physical nature of light replaces the expensive computational cost, also conceals their choices under the assumption that they have a trusted third party.
Abstract:Accelerating artificial intelligence by photonics is an active field of study aiming to exploit the unique properties of photons. Reinforcement learning is an important branch of machine learning, and photonic decision-making principles have been demonstrated with respect to the multi-armed bandit problems. However, reinforcement learning could involve a massive number of states, unlike previously demonstrated bandit problems where the number of states is only one. Q-learning is a well-known approach in reinforcement learning that can deal with many states. The architecture of Q-learning, however, does not fit well photonic implementations due to its separation of update rule and the action selection. In this study, we organize a new architecture for multi-state reinforcement learning as a parallel array of bandit problems in order to benefit from photonic decision-makers, which we call parallel bandit architecture for reinforcement learning or PBRL in short. Taking a cart-pole balancing problem as an instance, we demonstrate that PBRL adapts to the environment in fewer time steps than Q-learning. Furthermore, PBRL yields faster adaptation when operated with a chaotic laser time series than the case with uniformly distributed pseudorandom numbers where the autocorrelation inherent in the laser chaos provides a positive effect. We also find that the variety of states that the system undergoes during the learning phase exhibits completely different properties between PBRL and Q-learning. The insights obtained through the present study are also beneficial for existing computing platforms, not just photonic realizations, in accelerating performances by the PBRL algorithms and correlated random sequences.
Abstract:Photonic accelerators have been intensively studied to provide enhanced information processing capability to benefit from the unique attributes of physical processes. Recently, it has been reported that chaotically oscillating ultrafast time series from a laser, called laser chaos, provides the ability to solve multi-armed bandit (MAB) problems or decision-making problems at GHz order. Furthermore, it has been confirmed that the negatively correlated time-domain structure of laser chaos contributes to the acceleration of decision-making. However, the underlying mechanism of why decision-making is accelerated by correlated time series is unknown. In this paper, we demonstrate a theoretical model to account for the acceleration of decision-making by correlated time sequence. We first confirm the effectiveness of the negative autocorrelation inherent in time series for solving two-armed bandit problems using Fourier transform surrogate methods. We propose a theoretical model that concerns the correlated time series subjected to the decision-making system and the internal status of the system therein in a unified manner, inspired by correlated random walks. We demonstrate that the performance derived analytically by the theory agrees well with the numerical simulations, which confirms the validity of the proposed model and leads to optimal system design. The present study paves the new way for the effectiveness of correlated time series for decision-making, impacting artificial intelligence and other applications.
Abstract:In recent cross-disciplinary studies involving both optics and computing, single-photon-based decision-making has been demonstrated by utilizing the wave-particle duality of light to solve multi-armed bandit problems. Furthermore, entangled-photon-based decision-making has managed to solve a competitive multi-armed bandit problem in such a way that conflicts of decisions among players are avoided while ensuring equality. However, as these studies are based on the polarization of light, the number of available choices is limited to two, corresponding to two orthogonal polarization states. Here we propose a scalable principle to solve competitive decision-making situations by using the orbital angular momentum as the tunable degree of freedom of photons, which theoretically allows an unlimited number of arms. Moreover, by extending the Hong-Ou-Mandel effect to more than two states, we theoretically establish an experimental configuration able to generate entangled photon states with orbital angular momentum and conditions that provide conflict-free selections at every turn. We numerically examine total rewards regarding three-armed bandit problems, for which the proposed strategy accomplishes almost the theoretical maximum, which is greater than a conventional mixed strategy intending to realize Nash equilibrium. This is thanks to the entanglement property that achieves no-conflict selections, even in the exploring phase to find the best arms.
Abstract:By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series. Although the algorithm detects the arm with the highest reward expectation, the correct recognition of the order of arms in terms of reward expectations is not achievable. Here, we present an algorithm where the degree of exploration is adaptively controlled based on confidence intervals that represent the estimation accuracy of reward expectations. We have demonstrated numerically that our approach did improve arm order recognition accuracy significantly, along with reduced dependence on reward environments, and the total reward is almost maintained compared with conventional MAB methods. This study applies to sectors where the order information is critical, such as efficient allocation of resources in information and communications technology.
Abstract:Generative adversarial network (GAN) is gaining increased importance in artificially constructing natural images and related functionalities wherein two networks called generator and discriminator are evolving through adversarial mechanisms. Using deep convolutional neural networks and related techniques, high-resolution, highly realistic scenes, human faces, among others have been generated. While GAN in general needs a large amount of genuine training data sets, it is noteworthy that vast amounts of pseudorandom numbers are required. Here we utilize chaotic time series generated experimentally by semiconductor lasers for the latent variables of GAN whereby the inherent nature of chaos can be reflected or transformed into the generated output data. We show that the similarity in proximity, which is a degree of robustness of the generated images with respects to a minute change in the input latent variables, is enhanced while the versatility as a whole is not severely degraded. Furthermore, we demonstrate that the surrogate chaos time series eliminates the signature of generated images that is originally observed corresponding to the negative autocorrelation inherent in the chaos sequence. We also discuss the impact of utilizing chaotic time series in retrieving images from the trained generator.
Abstract:The competitive multi-armed bandit (CMAB) problem is related to social issues such as maximizing total social benefits while preserving equality among individuals by overcoming conflicts between individual decisions, which could seriously decrease social benefits. The study described herein provides experimental evidence that entangled photons physically resolve the CMAB, maximizing the social rewards while ensuring equality. Moreover, by exploiting the requirement that entangled photons share a common polarization basis, we demonstrated that deception, or delaying the other player receiving a greater reward, cannot be accomplished in a polarization-entangled-photon-based system, while deception is achievable in systems based on classical or polarization-correlated photons. Autonomous alignment schemes for polarization bases were also experimentally demonstrated based on decision conflict information. This study provides the foundation for collective decision making based on polarization-entangled photons and their polarization and value alignment, which is essential for utilizing quantum light for intelligent functionalities.