In this paper, we examine the fundamental performance limitations of online machine learning, by viewing the online learning problem as a prediction problem with causal side information. Towards this end, we combine the entropic analysis from information theory and the innovations approach from prediction theory to derive generic lower bounds on the prediction errors as well as the conditions (in terms of, e.g., directed information) to achieve the bounds. It is seen in general that no specific restrictions have to be imposed on the learning algorithms or the distributions of the data points for the performance bounds to be valid. In addition, the cases of supervised learning, semi-supervised learning, as well as unsupervised learning can all be analyzed accordingly. We also investigate the implications of the results in analyzing the fundamental limits of generalization.