Abstract:The combination of semi-supervised learning (SemiSL) and contrastive learning (CL) has been successful in medical image segmentation with limited annotations. However, these works often rely on pretext tasks that lack the specificity required for pixel-level segmentation, and still face overfitting issues due to insufficient supervision signals resulting from too few annotations. Therefore, this paper proposes an affinity-graph-guided semi-supervised contrastive learning framework (Semi-AGCL) by establishing additional affinity-graph-based supervision signals between the student and teacher network, to achieve medical image segmentation with minimal annotations without pretext. The framework first designs an average-patch-entropy-driven inter-patch sampling method, which can provide a robust initial feature space without relying on pretext tasks. Furthermore, the framework designs an affinity-graph-guided loss function, which can improve the quality of the learned representation and the model generalization ability by exploiting the inherent structure of the data, thus mitigating overfitting. Our experiments indicate that with merely 10% of the complete annotation set, our model approaches the accuracy of the fully annotated baseline, manifesting a marginal deviation of only 2.52%. Under the stringent conditions where only 5% of the annotations are employed, our model exhibits a significant enhancement in performance surpassing the second best baseline by 23.09% on the dice metric and achieving an improvement of 26.57% on the notably arduous CRAG and ACDC datasets.
Abstract:Passively cooled base stations (PCBSs) have emerged to deliver better cost and energy efficiency. However, passive cooling necessitates intelligent thermal control via traffic management, i.e., the instantaneous data traffic or throughput of a PCBS directly impacts its thermal performance. This is particularly challenging for outdoor deployment of PCBSs because the heat dissipation efficiency is uncertain and fluctuates over time. What is more, the PCBSs are interference-coupled in multi-cell scenarios. Thus, a higher-throughput PCBS leads to higher interference to the other PCBSs, which, in turn, would require more resource consumption to meet their respective throughput targets. In this paper, we address online decision-making for maximizing the total downlink throughput for a multi-PCBS system subject to constraints related on operating temperature. We demonstrate that a reinforcement learning (RL) approach, specifically soft actor-critic (SAC), can successfully perform throughput maximization while keeping the PCBSs cool, by adapting the throughput to time-varying heat dissipation conditions. Furthermore, we design a denial and reward mechanism that effectively mitigates the risk of overheating during the exploration phase of RL. Simulation results show that our approach achieves up to 88.6% of the global optimum. This is very promising, as our approach operates without prior knowledge of future heat dissipation efficiency, which is required by the global optimum.
Abstract:Due to the lack of large-scale labeled Thermal InfraRed (TIR) training datasets, most existing TIR trackers are trained directly on RGB datasets. However, tracking methods trained on RGB datasets suffer a significant drop-off in TIR data due to the domain shift issue. To this end, in this work, we propose a Progressive Domain Adaptation framework for TIR Tracking (PDAT), which transfers useful knowledge learned from RGB tracking to TIR tracking. The framework makes full use of large-scale labeled RGB datasets without requiring time-consuming and labor-intensive labeling of large-scale TIR data. Specifically, we first propose an adversarial-based global domain adaptation module to reduce domain gap on the feature level coarsely. Second, we design a clustering-based subdomain adaptation method to further align the feature distributions of the RGB and TIR datasets finely. These two domain adaptation modules gradually eliminate the discrepancy between the two domains, and thus learn domain-invariant fine-grained features through progressive training. Additionally, we collect a largescale TIR dataset with over 1.48 million unlabeled TIR images for training the proposed domain adaptation framework. Experimental results on five TIR tracking benchmarks show that the proposed method gains a nearly 6% success rate, demonstrating its effectiveness.
Abstract:Mobile edge computing (MEC) has been regarded as a promising approach to deal with explosive computation requirements by enabling cloud computing capabilities at the edge of networks. Existing models of MEC impose some strong assumptions on the known processing cycles and unintermittent communications. However, practical MEC systems are constrained by various uncertainties and intermittent communications, rendering these assumptions impractical. In view of this, we investigate how to schedule task offloading in MEC systems with uncertainties. First, we derive a closed-form expression of the average offloading success probability in a device-to-device (D2D) assisted MEC system with uncertain computation processing cycles and intermittent communications. Then, we formulate a task offloading maximization problem (TOMP), and prove that the problem is NP-hard. For problem solving, if the problem instance exhibits a symmetric structure, we propose a task scheduling algorithm based on dynamic programming (TSDP). By solving this problem instance, we derive a bound to benchmark sub-optimal algorithm. For general scenarios, by reformulating the problem, we propose a repeated matching algorithm (RMA). Finally, in performance evaluations, we validate the accuracy of the closed-form expression of the average offloading success probability by Monte Carlo simulations, as well as the effectiveness of the proposed algorithms.
Abstract:Convolutional neural networks (CNNs) have been successfully applied to the single target tracking task in recent years. Generally, training a deep CNN model requires numerous labeled training samples, and the number and quality of these samples directly affect the representational capability of the trained model. However, this approach is restrictive in practice, because manually labeling such a large number of training samples is time-consuming and prohibitively expensive. In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model. Under the guidance of active learning, the tracker based on the trained deep CNNs model can achieve competitive tracking performance while reducing the labeling cost. More specifically, to ensure the diversity of selected samples, we propose an active learning method based on multi-frame collaboration to select those training samples that should be and need to be annotated. Meanwhile, considering the representativeness of these selected samples, we adopt a nearest neighbor discrimination method based on the average nearest neighbor distance to screen isolated samples and low-quality samples. Therefore, the training samples subset selected based on our method requires only a given budget to maintain the diversity and representativeness of the entire sample set. Furthermore, we adopt a Tversky loss to improve the bounding box estimation of our tracker, which can ensure that the tracker achieves more accurate target states. Extensive experimental results confirm that our active learning-based tracker (ALT) achieves competitive tracking accuracy and speed compared with state-of-the-art trackers on the seven most challenging evaluation benchmarks.
Abstract:The current Siamese network based on region proposal network (RPN) has attracted great attention in visual tracking due to its excellent accuracy and high efficiency. However, the design of the RPN involves the selection of the number, scale, and aspect ratios of anchor boxes, which will affect the applicability and convenience of the model. Furthermore, these anchor boxes require complicated calculations, such as calculating their intersection-over-union (IoU) with ground truth bounding boxes.Due to the problems related to anchor boxes, we propose a simple yet effective anchor-free tracker (named Siamese corner networks, SiamCorners), which is end-to-end trained offline on large-scale image pairs. Specifically, we introduce a modified corner pooling layer to convert the bounding box estimate of the target into a pair of corner predictions (the bottom-right and the top-left corners). By tracking a target as a pair of corners, we avoid the need to design the anchor boxes. This will make the entire tracking algorithm more flexible and simple than anchorbased trackers. In our network design, we further introduce a layer-wise feature aggregation strategy that enables the corner pooling module to predict multiple corners for a tracking target in deep networks. We then introduce a new penalty term that is used to select an optimal tracking box in these candidate corners. Finally, SiamCorners achieves experimental results that are comparable to the state-of-art tracker while maintaining a high running speed. In particular, SiamCorners achieves a 53.7% AUC on NFS30 and a 61.4% AUC on UAV123, while still running at 42 frames per second (FPS).
Abstract:This paper considers the motion energy minimization problem for a wirelessly connected robot using millimeter-wave (mm-wave) communications. These are assisted by an intelligent reflective surface (IRS) that enhances the coverage at such high frequencies characterized by high blockage sensitivity. The robot is subject to time and uplink communication quality of service (QoS) constraints. This is a fundamental problem in fully automated factories that characterize Industry 4.0, where robots may have to perform tasks with given deadlines while maximizing the battery autonomy and communication efficiency. To account for the mutual dependence between robot position and communication QoS, we propose a joint optimization of robot trajectory and beamforming at the IRS and access point (AP). We present a solution that first exploits mm-wave channel characteristics to decouple beamforming and trajectory optimization. Then, the latter is solved by a successive-convex optimization-based algorithm. The algorithm takes into account the obstacles' positions and a radio map to avoid collisions and poorly covered areas. We prove that the algorithm can converge to a solution satisfying the Karush-Kuhn-Tucker (KKT) conditions. The simulation results show a dramatic reduction of the motion energy consumption with respect to methods that aim to find maximum-rate trajectories. Moreover, we show how the IRS and the beamforming optimization improve the motion energy efficiency of the robot.
Abstract:Because the expansion path of U-Net may ignore the characteristics of small targets, intermediate supervision mechanism is proposed. The original mask is also entered into the network as a label for intermediate output. However, U-Net is mainly engaged in segmentation, and the extracted features are also targeted at segmentation location information, and the input and output are different. The label we need is that the input and output are both original masks, which is more similar to the refactoring process, so we propose another intermediate supervision mechanism. However, the features extracted by the contraction path of this intermediate monitoring mechanism are not necessarily consistent. For example, U-Net's contraction path extracts transverse features, while auto-encoder extracts longitudinal features, which may cause the output of the expansion path to be inconsistent with the label. Therefore, we put forward the intermediate supervision mechanism of shared-weight decoder module. Although the intermediate supervision mechanism improves the segmentation accuracy, the training time is too long due to the extra input and multiple loss functions. For one of these problems, we have introduced tied-weight decoder. To reduce the redundancy of the model, we combine shared-weight decoder module with tied-weight decoder module.
Abstract:In this paper, we present a Large-Scale and high-diversity general Thermal InfraRed (TIR) Object Tracking Benchmark, called LSOTBTIR, which consists of an evaluation dataset and a training dataset with a total of 1,400 TIR sequences and more than 600K frames. We annotate the bounding box of objects in every frame of all sequences and generate over 730K bounding boxes in total. To the best of our knowledge, LSOTB-TIR is the largest and most diverse TIR object tracking benchmark to date. To evaluate a tracker on different attributes, we define 4 scenario attributes and 12 challenge attributes in the evaluation dataset. By releasing LSOTB-TIR, we encourage the community to develop deep learning based TIR trackers and evaluate them fairly and comprehensively. We evaluate and analyze more than 30 trackers on LSOTB-TIR to provide a series of baselines, and the results show that deep trackers achieve promising performance. Furthermore, we re-train several representative deep trackers on LSOTB-TIR, and their results demonstrate that the proposed training dataset significantly improves the performance of deep TIR trackers. Codes and dataset are available at https://github.com/QiaoLiuHit/LSOTB-TIR.
Abstract:Most existing tracking methods are based on using a classifier and multi-scale estimation to estimate the state of the target. Consequently, and as expected, trackers have become more stable while tracking accuracy has stagnated. While the ATOM \cite{ATOM} tracker adopts a maximum overlap method based on an intersection-over-union (IoU) loss to mitigate this problem, there are defects in the IoU loss itself, that make it impossible to continue to optimize the objective function when a given bounding box is completely contained within another bounding box; this makes it very challenging to accurately estimate the target state. Accordingly, in this paper, we address the above-mentioned problem by proposing a novel tracking method based on a distance-IoU (DIoU) loss, such that the proposed tracker consists of a target estimation component and a target classification component. The target estimation component is trained to predict the DIoU score between the target ground-truth bounding-box and the estimated bounding-box. The DIoU loss can maintain the advantage provided by the IoU loss while minimizing the distance between the center points of two bounding boxes, thereby making the target estimation more accurate. Moreover, we introduce a classification component that is trained online to guarantee real-time tracking speed. Comprehensive experimental results demonstrate that our DIoUTrack achieves competitive tracking accuracy when compared with state-of-the-art trackers while also tracking speed is over $50$ $fps$.