In this paper a vision-based system for detection, motion tracking and following of Unmanned Aerial Vehicle (UAV) with other UAV (follower) is presented. For detection of an airborne UAV we apply a convolutional neural network YOLO trained on a collected and processed dataset of 10,000 images. The trained network is capable of detecting various multirotor UAVs in indoor, outdoor and simulation environments. Furthermore, detection results are improved with Kalman filter which ensures steady and reliable information about position and velocity of a target UAV. Preserving the target UAV in the field of view (FOV) and at required distance is accomplished by a simple nonlinear controller based on visual servoing strategy. The proposed system achieves a real-time performance on Neural Compute Stick 2 with a speed of 20 frames per second (FPS) for the detection of an UAV. Justification and efficiency of the developed vision-based system are confirmed in Gazebo simulation experiment where the target UAV is executing a 3D trajectory in a shape of number eight.