Mining in proof-of-work blockchains has become an expensive affair requiring specialized hardware capable of executing several megahashes per second at huge electricity costs. Miners earn a reward each time they mine a block within the longest chain, which helps offset their mining costs. It is therefore of interest to miners to maximize the number of mined blocks in the blockchain and increase revenue. A key factor affecting mining rewards earned is the connectivity between miners in the peer-to-peer network. To maximize rewards a miner must choose its network connections carefully, ensuring existence of paths to other miners that are on average of a lower latency compared to paths between other miners. We formulate the problem of deciding whom to connect to for miners as a combinatorial bandit problem. Each node picks its neighbors strategically to minimize the latency to reach 90\% of the hash power of the network relative to the 90-th percentile latency from other nodes. A key contribution of our work is the use of a network coordinates based model for learning the network structure within the bandit algorithm. Experimentally we show our proposed algorithm outperforming or matching baselines on diverse network settings.