This paper presents a novel approach for lecture video indexing using a boosted deep convolutional neural network system. The indexing is performed by matching high quality slide images, for which text is either known or extracted, to lower resolution video frames with possible noise, perspective distortion, and occlusions. We propose a deep neural network integrated with a boosting framework composed of two sub-networks targeting feature extraction and similarity determination to perform the matching. The trained network is given as input a pair of slide image and a candidate video frame image and produces the similarity between them. A boosting framework is integrated into our proposed network during the training process. Experimental results show that the proposed approach is much more capable of handling occlusion, spatial transformations, and other types of noises when compared with known approaches.