Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye-Ji Kim

Three-Stream Fusion Network for First-Person Interaction Recognition

Feb 19, 2020

Ye-Ji Kim, Dong-Gyu Lee, Seong-Whan Lee

Figure 1 for Three-Stream Fusion Network for First-Person Interaction Recognition

Figure 2 for Three-Stream Fusion Network for First-Person Interaction Recognition

Figure 3 for Three-Stream Fusion Network for First-Person Interaction Recognition

Figure 4 for Three-Stream Fusion Network for First-Person Interaction Recognition

Abstract:First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. Thre three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory(LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two-public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs.

* 30 pages, 9 figures

Via

Access Paper or Ask Questions