Abstract:Image Coding for Machines (ICM) is an image compression technique for image recognition. This technique is essential due to the growing demand for image recognition AI. In this paper, we propose a method for ICM that focuses on encoding and decoding only the edge information of object parts in an image, which we call SA-ICM. This is an Learned Image Compression (LIC) model trained using edge information created by Segment Anything. Our method can be used for image recognition models with various tasks. SA-ICM is also robust to changes in input data, making it effective for a variety of use cases. Additionally, our method provides benefits from a privacy point of view, as it removes human facial information on the encoder's side, thus protecting one's privacy. Furthermore, this LIC model training method can be used to train Neural Representations for Videos (NeRV), which is a video compression model. By training NeRV using edge information created by Segment Anything, it is possible to create a NeRV that is effective for image recognition (SA-NeRV). Experimental results confirm the advantages of SA-ICM, presenting the best performance in image compression for image recognition. We also show that SA-NeRV is superior to ordinary NeRV in video compression for machines.
Abstract:Compression technology is essential for efficient image transmission and storage. With the rapid advances in deep learning, images are beginning to be used for image recognition as well as for human vision. For this reason, research has been conducted on image coding for image recognition, and this field is called Image Coding for Machines (ICM). There are two main approaches in ICM: the ROI-based approach and the task-loss-based approach. The former approach has the problem of requiring an ROI-map as input in addition to the input image. The latter approach has the problems of difficulty in learning the task-loss, and lack of robustness because the specific image recognition model is used to compute the loss function. To solve these problems, we propose an image compression model that learns object regions. Our model does not require additional information as input, such as an ROI-map, and does not use task-loss. Therefore, it is possible to compress images for various image recognition models. In the experiments, we demonstrate the versatility of the proposed method by using three different image recognition models and three different datasets. In addition, we verify the effectiveness of our model by comparing it with previous methods.
Abstract:In recent years, video analysis using Artificial Intelligence (AI) has been widely used, due to the remarkable development of image recognition technology using deep learning. In 2019, the Moving Picture Experts Group (MPEG) has started standardization of Video Coding for Machines (VCM) as a video coding technology for image recognition. In the framework of VCM, both higher image recognition accuracy and video compression performance are required. In this paper, we propose an extention scheme of video coding for object detection using Versatile Video Coding (VVC). Unlike video for human vision, video used for object detection does not require a large image size or high contrast. Since downsampling of the image can reduce the amount of information to be transmitted. Due to the decrease in image contrast, entropy of the image becomes smaller. Therefore, in our proposed scheme, the original image is reduced in size and contrast, then coded with VVC encoder to achieve high compression performance. Then, the output image from the VVC decoder is restored to its original image size using the bicubic method. Experimental results show that the proposed video coding scheme achieves better coding performance than regular VVC in terms of object detection accuracy.
Abstract:With advances in image recognition technology based on deep learning, automatic video analysis by Artificial Intelligence is becoming more widespread. As the amount of video used for image recognition increases, efficient compression methods for such video data are necessary. In general, when the image quality deteriorates due to image encoding, the image recognition accuracy also falls. Therefore, in this paper, we propose a neural-network-based approach to improve image recognition accuracy, especially the object detection accuracy by applying post-processing to the encoded video. Versatile Video Coding (VVC) will be used for the video compression method, since it is the latest video coding method with the best encoding performance. The neural network is trained using the features of YOLO-v7, the latest object detection model. By using VVC as the video coding method and YOLO-v7 as the detection model, high object detection accuracy is achieved even at low bit rates. Experimental results show that the combination of the proposed method and VVC achieves better coding performance than regular VVC in object detection accuracy.