Every generation of mobile devices strives to capture video at higher resolution and frame rate than previous ones. This quality increase also requires additional power and computation to capture and encode high-quality media. We propose a method to reduce the overall power consumption for capturing high-quality videos in mobile devices. Using video frame interpolation (VFI), sensors can be driven at lower frame rate, which reduces sensor power consumption. With modern RGB hybrid event-based vision sensors (EVS), event data can be used to guide the interpolation, leading to results of much higher quality. If applied naively, interpolation methods can be expensive and lead to large amounts of intermediate data before video is encoded. This paper proposes a video encoder that generates a bitstream for high frame rate video without explicit interpolation. The proposed method estimates encoded video data (notably motion vectors) rather than frames. Thus, an encoded video file can be produced directly without explicitly producing intermediate frames.