We study the problem of \textit{online} low-rank matrix completion with $\mathsf{M}$ users, $\mathsf{N}$ items and $\mathsf{T}$ rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in $\mathsf{T}$). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3})$. That is, roughly only $\mathsf{polylog} (\mathsf{M}+\mathsf{N})$ item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-$1$ setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of $O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2})$. Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in $\mathsf{T}$.