Understanding the flow in 3D space of sparsely sampled points between two consecutive time frames is the core stone of modern geometric-driven systems such as VR/AR, Robotics, and Autonomous driving. The lack of real, non-simulated, labeled data for this task emphasizes the importance of self- or un-supervised deep architectures. This work presents a new self-supervised training method and an architecture for the 3D scene flow estimation under occlusions. Here we show that smart multi-layer fusion between flow prediction and occlusion detection outperforms traditional architectures by a large margin for occluded and non-occluded scenarios. We report state-of-the-art results on Flyingthings3D and KITTI datasets for both the supervised and self-supervised training.