Abstract:The rapid development of diffusion models (DMs) has significantly advanced image and video applications, making "what you want is what you see" a reality. Among these, video editing has gained substantial attention and seen a swift rise in research activity, necessitating a comprehensive and systematic review of the existing literature. This paper reviews diffusion model-based video editing techniques, including theoretical foundations and practical applications. We begin by overviewing the mathematical formulation and image domain's key methods. Subsequently, we categorize video editing approaches by the inherent connections of their core technologies, depicting evolutionary trajectory. This paper also dives into novel applications, including point-based editing and pose-guided human video editing. Additionally, we present a comprehensive comparison using our newly introduced V2VBench. Building on the progress achieved to date, the paper concludes with ongoing challenges and potential directions for future research.
Abstract:Existing approaches towards anomaly detection~(AD) often rely on a substantial amount of anomaly-free data to train representation and density models. However, large anomaly-free datasets may not always be available before the inference stage; in which case an anomaly detection model must be trained with only a handful of normal samples, a.k.a. few-shot anomaly detection (FSAD). In this paper, we propose a novel methodology to address the challenge of FSAD which incorporates two important techniques. Firstly, we employ a model pre-trained on a large source dataset to initialize model weights. Secondly, to ameliorate the covariate shift between source and target domains, we adopt contrastive training to fine-tune on the few-shot target domain data. To learn suitable representations for the downstream AD task, we additionally incorporate cross-instance positive pairs to encourage a tight cluster of the normal samples, and negative pairs for better separation between normal and synthesized negative samples. We evaluate few-shot anomaly detection on on 3 controlled AD tasks and 4 real-world AD tasks to demonstrate the effectiveness of the proposed method.
Abstract:The 3GPP has recently conducted a study on the Ambient Internet of Things (AIoT), with a particular emphasis on examining backscatter communications as one of the primary techniques under consideration. Previous investigations into Ambient Backscatter Communications (AmBC) within the long term evolution (LTE) downlink have shown that it is feasible to utilize the user equipment channel estimator as a receiver for demodulating frequency shift keyed (FSK) messages transmitted by the backscatter devices. In practical deployment scenarios, the backscattered link often experiences a low signal-to-noise ratio, leading to subpar bit error rate (BER) performance in the case of uncoded transmissions. In this paper, we propose the adoption of the same convolutional coding methodology for backscatter links that is already employed for LTE downlink control signals. This approach facilitates the reuse of identical demodulation functions at the modem for both control signals and backscattered AIoT messages. To assess the performance of the proposed scheme, we conducted experiments utilizing real LTE downlink signals generated by a mobile operator within an office environment. When compared to uncoded FSK, convolutional channel coding delivers a notable gain of approximately 6 dB at a BER of $10^{-3}$. Consequently, the AmBC system demonstrates a high level of reliability, achieving a BER of $10^{-3}$ at a Signal-to-Noise Ratio (SNR) of 5 dB.
Abstract:Long Term Evolution (LTE) signal is ubiquitously present in electromagnetic (EM) background environment, which make it an attractive signal source for the ambient backscatter communications (AmBC). In this paper, we propose a system, in which a backscatter device (BD) introduces artificial Doppler shift to the channel which is larger than the natural Doppler but still small enough such that it can be tracked by the channel estimator at the User Equipment (UE). Channel estimation is done using the downlink cell specific reference signals (CRS) that are present regardless the UE being attached to the network or not. FSK was selected due to its robust operation in a fading channel. We describe the whole AmBC system, use two receivers. Finally, numerical simulations and measurements are provided to validate the proposed FSK AmBC performance.
Abstract:Long Term Evolution (LTE) systems provide ubiquitous coverage for mobile communications, which makes it a promising candidate to be used as a signal source in the ambient backscatter communications. In this paper, we propose a system in which a backscatter device modulates the ambient LTE signal by changing its reflection coefficient and the receiver uses the LTE Cell Specific Reference Signals (CRS) to estimate the channel and demodulates the backscattered signal from the obtained channel impulse response estimates. We first outline the overall system, discuss the receiver operation, and then provide experimental evidence on the practicality of the proposed system.
Abstract:Semi-supervised learning (SSL) addresses the lack of labeled data by exploiting large unlabeled data through pseudolabeling. However, in the extremely low-label regime, pseudo labels could be incorrect, a.k.a. the confirmation bias, and the pseudo labels will in turn harm the network training. Recent studies combined finetuning (FT) from pretrained weights with SSL to mitigate the challenges and claimed superior results in the low-label regime. In this work, we first show that the better pretrained weights brought in by FT account for the state-of-the-art performance, and importantly that they are universally helpful to off-the-shelf semi-supervised learners. We further argue that direct finetuning from pretrained weights is suboptimal due to covariate shift and propose a contrastive target pretraining step to adapt model weights towards target dataset. We carried out extensive experiments on both classification and segmentation tasks by doing target pretraining then followed by semi-supervised finetuning. The promising results validate the efficacy of target pretraining for SSL, in particular in the low-label regime.