Abstract:We present a solution to "Google Cloud and YouTube-8M Video Understanding Challenge" that ranked 5th place. The proposed model is an ensemble of three model families, two frame level and one video level. The training was performed on augmented dataset, with cross validation.