Tiny Machine Learning (TinyML) is a branch of Machine Learning (ML) that constitutes a bridge between the ML world and the embedded system ecosystem (i.e., Internet of Things devices, embedded devices, and edge computing units), enabling the execution of ML algorithms on devices constrained in terms of memory, computational capabilities, and power consumption. Video Streaming Analysis (VSA), one of the most interesting tasks of TinyML, consists in scanning a sequence of frames in a streaming manner, with the goal of identifying interesting patterns. Given the strict constraints of these tiny devices, all the current solutions rely on performing a frame-by-frame analysis, hence not exploiting the temporal component in the stream of data. In this paper, we present StreamTinyNet, the first TinyML architecture to perform multiple-frame VSA, enabling a variety of use cases that requires spatial-temporal analysis that were previously impossible to be carried out at a TinyML level. Experimental results on public-available datasets show the effectiveness and efficiency of the proposed solution. Finally, StreamTinyNet has been ported and tested on the Arduino Nicla Vision, showing the feasibility of what proposed.