Device-to-device (D2D) communication that allows proximity users to communicate directly has been recently proposed to improve spectral efficiency of cellular networks. In this paper, we assume a cellular network consisting of multiple cellular user equipments (CUEs), which are the primary users, and a cognitive D2D pair, which is the secondary user. The D2D pair needs a bandwidth for data transmission that can be obtained via spectrum trading. We introduce a bandwidth-auction game for the spectrum trading problem. The base station (BS) and CUEs are able to sell their spectrum or share it with the D2D pair, which allows the D2D pair to operate in orthogonal sharing, cellular, or non-orthogonal sharing (NOS) modes. Operation of the D2D pair in the NOS mode causes interference to the CUEs, which is possible under low interference condition. In the auction, the D2D pair can buy its required spectrum from three different service providers (SPs) corresponding to each mode that operateon different frequency spectrums. The D2D pair bids a price bandwidth demand curve and the SPs offer a price-demand supply curve. Since each player is not aware of the strategy of other players in practical scenarios, the game is assumed to be an incomplete information repeated one. A best response based learning method is proposed for the decision making procedure of all players, the D2D pair and SPs. It is shown that the proposed method converges to the Nash equilibrium (NE) point of the game more rapidly than the state-of-the-art methods when the game is played repeatedly. The sensitivity of the proposed method to the learning rate variable is also less than the state-of-the-art methods and hence can be considered as a robust one.