In this paper, we present a model that can directly predict emotion intensity score from video inputs, instead of deriving from action units. Using a 3d DNN incorporated with dynamic emotion information, we train a model using videos of different people smiling that outputs an intensity score from 0-10. Each video is labeled framewise using a normalized action-unit based intensity score. Our model then employs an adaptive learning technique to improve performance when dealing with new subjects. Compared to other models, our model excels in generalization between different people as well as provides a new framework to directly classify emotional intensity.