Abstract:This paper presents a novel approach for unsupervised video summarization using reinforcement learning. It aims to address the existing limitations of current unsupervised methods, including unstable training of adversarial generator-discriminator architectures and reliance on hand-crafted reward functions for quality evaluation. The proposed method is based on the concept that a concise and informative summary should result in a reconstructed video that closely resembles the original. The summarizer model assigns an importance score to each frame and generates a video summary. In the proposed scheme, reinforcement learning, coupled with a unique reward generation pipeline, is employed to train the summarizer model. The reward generation pipeline trains the summarizer to create summaries that lead to improved reconstructions. It comprises a generator model capable of reconstructing masked frames from a partially masked video, along with a reward mechanism that compares the reconstructed video from the summary against the original. The video generator is trained in a self-supervised manner to reconstruct randomly masked frames, enhancing its ability to generate accurate summaries. This training pipeline results in a summarizer model that better mimics human-generated video summaries compared to methods relying on hand-crafted rewards. The training process consists of two stable and isolated training steps, unlike adversarial architectures. Experimental results demonstrate promising performance, with F-scores of 62.3 and 54.5 on TVSum and SumMe datasets, respectively. Additionally, the inference stage is 300 times faster than our previously reported state-of-the-art method.
Abstract:Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split points. It explores various parameter aggregation strategies of SplitFed by examining the impact of splitting the model at different points-either shallow split or deep split-on the final global model performance. The experiments, conducted on a human embryo image segmentation task, reveal a statistically significant advantage of a deeper split point.
Abstract:Decentralized machine learning has broadened its scope recently with the invention of Federated Learning (FL), Split Learning (SL), and their hybrids like Split Federated Learning (SplitFed or SFL). The goal of SFL is to reduce the computational power required by each client in FL and parallelize SL while maintaining privacy. This paper investigates the robustness of SFL against packet loss on communication links. The performance of various SFL aggregation strategies is examined by splitting the model at two points -- shallow split and deep split -- and testing whether the split point makes a statistically significant difference to the accuracy of the final model. Experiments are carried out on a segmentation model for human embryo images and indicate the statistically significant advantage of a deeper split point.
Abstract:SplitFed Learning, a combination of Federated and Split Learning (FL and SL), is one of the most recent developments in the decentralized machine learning domain. In SplitFed learning, a model is trained by clients and a server collaboratively. For image segmentation, labels are created at each client independently and, therefore, are subject to clients' bias, inaccuracies, and inconsistencies. In this paper, we propose a data quality-based adaptive averaging strategy for SplitFed learning, called QA-SplitFed, to cope with the variation of annotated ground truth (GT) quality over multiple clients. The proposed method is compared against five state-of-the-art model averaging methods on the task of learning human embryo image segmentation. Our experiments show that all five baseline methods fail to maintain accuracy as the number of corrupted clients increases. QA-SplitFed, however, copes effectively with corruption as long as there is at least one uncorrupted client.
Abstract:Time Series Classification (TSC) is an important and challenging task for many visual computing applications. Despite the extensive range of methods developed for TSC, relatively few utilized Deep Neural Networks (DNNs). In this paper, we propose two novel attention blocks (Global Temporal Attention and Temporal Pseudo-Gaussian augmented Self-Attention) that can enhance deep learning-based TSC approaches, even when such approaches are designed and optimized for a specific dataset or task. We validate this claim by evaluating multiple state-of-the-art deep learning-based TSC models on the University of East Anglia (UEA) benchmark, a standardized collection of 30 Multivariate Time Series Classification (MTSC) datasets. We show that adding the proposed attention blocks improves base models' average accuracy by up to 3.6%. Additionally, the proposed TPS block uses a new injection module to include the relative positional information in transformers. As a standalone unit with less computational complexity, it enables TPS to perform better than most of the state-of-the-art DNN-based TSC methods. The source codes for our experimental setups and proposed attention blocks are made publicly available.
Abstract:Removing the effect of illumination variation in images has been proved to be beneficial in many computer vision applications such as object recognition and semantic segmentation. Although generating illumination-invariant images has been studied in the literature before, it has not been investigated on real 4-channel (4D) data. In this study, we examine the quality of illumination-invariant images generated from red, green, blue, and near-infrared (RGBN) data. Our experiments show that the near-infrared channel substantively contributes toward removing illumination. As shown in our numerical and visual results, the illumination-invariant image obtained by RGBN data is superior compared to that obtained by RGB alone.
Abstract:Cloud Segmentation is one of the fundamental steps in optical remote sensing image analysis. Current methods for identification of cloud regions in aerial or satellite images are not accurate enough especially in the presence of snow and haze. This paper presents a deep learning-based framework to address the problem of cloud detection in Landsat 8 imagery. The proposed method benefits from a convolutional neural network (Cloud-Net+) with multiple blocks, which is trained with a novel loss function (Filtered Jaccard loss). The proposed loss function is more sensitive to the absence of cloud pixels in an image and penalizes/rewards the predicted mask more accurately. The combination of Cloud-Net+ and Filtered Jaccard loss function delivers superior results over four public cloud detection datasets. Our experiments on one of the most common public datasets in computer vision (Pascal VOC dataset) show that the proposed network/loss function could be used in other segmentation tasks for more accurate performance/evaluation.
Abstract:We present a novel method for identification of the boundary of embryonic cells (blastomeres) in Hoffman Modulation Contrast (HMC) microscopic images that are taken between day one to day three. Identification of boundaries of blastomeres is a challenging task, especially in the cases containing four or more cells. This is because these cells are bundled up tightly inside an embryo's membrane and any 2D image projection of such 3D embryo includes cell overlaps, occlusions, and projection ambiguities. Moreover, human embryos include fragmentation, which does not conform to any specific patterns or shape. Here we developed a model-based iterative approach, in which blastomeres are modeled as ellipses that conform to the local image features, such as edges and normals. In an iterative process, each image feature contributes only to one candidate and is removed upon being associated to a model candidate. We have tested the proposed algorithm on an image dataset comprising of 468 human embryos obtained from different sources. An overall Precision, Sensitivity and Overall Quality (OQ) of 92%, 88% and 83% are achieved.
Abstract:Cloud detection in satellite images is an important first-step in many remote sensing applications. This problem is more challenging when only a limited number of spectral bands are available. To address this problem, a deep learning-based algorithm is proposed in this paper. This algorithm consists of a Fully Convolutional Network (FCN) that is trained by multiple patches of Landsat 8 images. This network, which is called Cloud-Net, is capable of capturing global and local cloud features in an image using its convolutional blocks. Since the proposed method is an end-to-end solution, no complicated pre-processing step is required. Our experimental results prove that the proposed method outperforms the state-of-the-art method over a benchmark dataset by 8.7\% in Jaccard Index.
Abstract:This paper presents a deep-learning based framework for addressing the problem of accurate cloud detection in remote sensing images. This framework benefits from a Fully Convolutional Neural Network (FCN), which is capable of pixel-level labeling of cloud regions in a Landsat 8 image. Also, a gradient-based identification approach is proposed to identify and exclude regions of snow/ice in the ground truths of the training set. We show that using the hybrid of the two methods (threshold-based and deep-learning) improves the performance of the cloud identification process without the need to manually correct automatically generated ground truths. In average the Jaccard index and recall measure are improved by 4.36% and 3.62%, respectively.