Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuang Zhang

Consistent Video Colorization via Palette Guidance

Jan 31, 2025

Han Wang, Yuang Zhang, Yuhong Zhang, Lingxiao Lu, Li Song

Figure 1 for Consistent Video Colorization via Palette Guidance

Figure 2 for Consistent Video Colorization via Palette Guidance

Figure 3 for Consistent Video Colorization via Palette Guidance

Figure 4 for Consistent Video Colorization via Palette Guidance

Abstract:Colorization is a traditional computer vision task and it plays an important role in many time-consuming tasks, such as old film restoration. Existing methods suffer from unsaturated color and temporally inconsistency. In this paper, we propose a novel pipeline to overcome the challenges. We regard the colorization task as a generative task and introduce Stable Video Diffusion (SVD) as our base model. We design a palette-based color guider to assist the model in generating vivid and consistent colors. The color context introduced by the palette not only provides guidance for color generation, but also enhances the stability of the generated colors through a unified color context across multiple sequences. Experiments demonstrate that the proposed method can provide vivid and stable colors for videos, surpassing previous methods.

Via

Access Paper or Ask Questions

GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Dec 11, 2024

Yuang Zhang, Liping Wang, Yihong Huang, Yuanxing Zheng

Figure 1 for GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Figure 2 for GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Figure 3 for GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Figure 4 for GradStop: Exploring Training Dynamics in Unsupervised Outlier Detection through Gradient Cohesion

Abstract:Unsupervised Outlier Detection (UOD) is a critical task in data mining and machine learning, aiming to identify instances that significantly deviate from the majority. Without any label, deep UOD methods struggle with the misalignment between the model's direct optimization goal and the final performance goal of Outlier Detection (OD) task. Through the perspective of training dynamics, this paper proposes an early stopping algorithm to optimize the training of deep UOD models, ensuring they perform optimally in OD rather than overfitting the entire contaminated dataset. Inspired by UOD mechanism and inlier priority phenomenon, where intuitively models fit inliers more quickly than outliers, we propose GradStop, a sampling-based label-free algorithm to estimate model's real-time performance during training. First, a sampling method generates two sets: one likely containing more outliers and the other more inliers, then a metric based on gradient cohesion is applied to probe into current training dynamics, which reflects model's performance on OD task. Experimental results on 4 deep UOD algorithms and 47 real-world datasets and theoretical proofs demonstrate the effectiveness of our proposed early stopping algorithm in enhancing the performance of deep UOD models. Auto Encoder (AE) enhanced by GradStop achieves better performance than itself, other SOTA UOD methods, and even ensemble AEs. Our method provides a robust and effective solution to the problem of performance degradation during training, enabling deep UOD models to achieve better potential in anomaly detection tasks.

Via

Access Paper or Ask Questions

Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

Nov 07, 2024

Yu Hu, Yuang Zhang, Yunlong Song, Yang Deng, Feng Yu, Linzuo Zhang, Weiyao Lin, Danping Zou, Wenxian Yu

Figure 1 for Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

Figure 2 for Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

Figure 3 for Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

Figure 4 for Seeing Through Pixel Motion: Learning Obstacle Avoidance from Optical Flow with One Camera

Abstract:Optical flow captures the motion of pixels in an image sequence over time, providing information about movement, depth, and environmental structure. Flying insects utilize this information to navigate and avoid obstacles, allowing them to execute highly agile maneuvers even in complex environments. Despite its potential, autonomous flying robots have yet to fully leverage this motion information to achieve comparable levels of agility and robustness. Challenges of control from optical flow include extracting accurate optical flow at high speeds, handling noisy estimation, and ensuring robust performance in complex environments. To address these challenges, we propose a novel end-to-end system for quadrotor obstacle avoidance using monocular optical flow. We develop an efficient differentiable simulator coupled with a simplified quadrotor model, allowing our policy to be trained directly through first-order gradient optimization. Additionally, we introduce a central flow attention mechanism and an action-guided active sensing strategy that enhances the policy's focus on task-relevant optical flow observations to enable more responsive decision-making during flight. Our system is validated both in simulation and the real world using an FPV racing drone. Despite being trained in a simple environment in simulation, our system is validated both in simulation and the real world using an FPV racing drone. Despite being trained in a simple environment in simulation, our system demonstrates agile and robust flight in various unknown, cluttered environments in the real world at speeds of up to 6m/s.

Via

Access Paper or Ask Questions

Channel-Aware Throughput Maximization for Cooperative Data Fusion in CAV

Oct 06, 2024

Haonan An, Zhengru Fang, Yuang Zhang, Senkang Hu, Xianhao Chen, Guowen Xu, Yuguang Fang

Abstract:Connected and autonomous vehicles (CAVs) have garnered significant attention due to their extended perception range and enhanced sensing coverage. To address challenges such as blind spots and obstructions, CAVs employ vehicle-to-vehicle (V2V) communications to aggregate sensory data from surrounding vehicles. However, cooperative perception is often constrained by the limitations of achievable network throughput and channel quality. In this paper, we propose a channel-aware throughput maximization approach to facilitate CAV data fusion, leveraging a self-supervised autoencoder for adaptive data compression. We formulate the problem as a mixed integer programming (MIP) model, which we decompose into two sub-problems to derive optimal data rate and compression ratio solutions under given link conditions. An autoencoder is then trained to minimize bitrate with the determined compression ratio, and a fine-tuning strategy is employed to further reduce spectrum resource consumption. Experimental evaluation on the OpenCOOD platform demonstrates the effectiveness of our proposed algorithm, showing more than 20.19\% improvement in network throughput and a 9.38\% increase in average precision (AP@IoU) compared to state-of-the-art methods, with an optimal latency of 19.99 ms.

Via

Access Paper or Ask Questions

Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Jul 16, 2024

Yuang Zhang, Yu Hu, Yunlong Song, Danping Zou, Weiyao Lin

Figure 1 for Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Figure 2 for Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Figure 3 for Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Figure 4 for Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics

Abstract:Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulation using a simple point-mass physics model and a depth rendering engine. Despite this simplicity, our method excels in challenging tasks for both multi-agent and single-agent applications with zero-shot sim-to-real transfer. In multi-agent scenarios, our system demonstrates self-organized behavior, enabling autonomous coordination without communication or centralized planning - an achievement not seen in existing traditional or learning-based methods. In single-agent scenarios, our system achieves a 90% success rate in navigating through complex environments, significantly surpassing the 60% success rate of the previous state-of-the-art approach. Our system can operate without state estimation and adapt to dynamic obstacles. In real-world forest environments, it navigates at speeds up to 20 m/s, doubling the speed of previous imitation learning-based solutions. Notably, all these capabilities are deployed on a budget-friendly $21 computer, costing less than 5% of a GPU-equipped board used in existing systems. Video demonstrations are available at https://youtu.be/LKg9hJqc2cc.

Via

Access Paper or Ask Questions

MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Jun 28, 2024

Yuang Zhang, Jiaxi Gu, Li-Wen Wang, Han Wang, Junqi Cheng, Yuefeng Zhu, Fangyuan Zou

Abstract:In recent years, generative artificial intelligence has achieved significant advancements in the field of image generation, spawning a variety of applications. However, video generation still faces considerable challenges in various aspects, such as controllability, video length, and richness of details, which hinder the application and popularization of this technology. In this work, we propose a controllable video generation framework, dubbed MimicMotion, which can generate high-quality videos of arbitrary length mimicking specific motion guidance. Compared with previous methods, our approach has several highlights. Firstly, we introduce confidence-aware pose guidance that ensures high frame quality and temporal smoothness. Secondly, we introduce regional loss amplification based on pose confidence, which significantly reduces image distortion. Lastly, for generating long and smooth videos, we propose a progressive latent fusion strategy. By this means, we can produce videos of arbitrary length with acceptable resource consumption. With extensive experiments and user studies, MimicMotion demonstrates significant improvements over previous approaches in various aspects. Detailed results and comparisons are available on our project page: https://tencent.github.io/MimicMotion .

Via

Access Paper or Ask Questions

EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

May 21, 2024

Yihong Huang, Yuang Zhang, Liping Wang, Fan Zhang, Xuemin Lin

Figure 1 for EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

Figure 2 for EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

Figure 3 for EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

Figure 4 for EntropyStop: Unsupervised Deep Outlier Detection with Loss Entropy

Abstract:Unsupervised Outlier Detection (UOD) is an important data mining task. With the advance of deep learning, deep Outlier Detection (OD) has received broad interest. Most deep UOD models are trained exclusively on clean datasets to learn the distribution of the normal data, which requires huge manual efforts to clean the real-world data if possible. Instead of relying on clean datasets, some approaches directly train and detect on unlabeled contaminated datasets, leading to the need for methods that are robust to such conditions. Ensemble methods emerged as a superior solution to enhance model robustness against contaminated training sets. However, the training time is greatly increased by the ensemble. In this study, we investigate the impact of outliers on the training phase, aiming to halt training on unlabeled contaminated datasets before performance degradation. Initially, we noted that blending normal and anomalous data causes AUC fluctuations, a label-dependent measure of detection accuracy. To circumvent the need for labels, we propose a zero-label entropy metric named Loss Entropy for loss distribution, enabling us to infer optimal stopping points for training without labels. Meanwhile, we theoretically demonstrate negative correlation between entropy metric and the label-based AUC. Based on this, we develop an automated early-stopping algorithm, EntropyStop, which halts training when loss entropy suggests the maximum model detection capability. We conduct extensive experiments on ADBench (including 47 real datasets), and the overall results indicate that AutoEncoder (AE) enhanced by our approach not only achieves better performance than ensemble AEs but also requires under 1\% of training time. Lastly, our proposed metric and early-stopping approach are evaluated on other deep OD models, exhibiting their broad potential applicability.

Via

Access Paper or Ask Questions

SmartCooper: Vehicular Collaborative Perception with Adaptive Fusion and Judger Mechanism

Feb 02, 2024

Yuang Zhang, Haonan An, Zhengru Fang, Guowen Xu, Yuan Zhou, Xianhao Chen, Yuguang Fang

Abstract:In recent years, autonomous driving has garnered significant attention due to its potential for improving road safety through collaborative perception among connected and autonomous vehicles (CAVs). However, time-varying channel variations in vehicular transmission environments demand dynamic allocation of communication resources. Moreover, in the context of collaborative perception, it is important to recognize that not all CAVs contribute valuable data, and some CAV data even have detrimental effects on collaborative perception. In this paper, we introduce SmartCooper, an adaptive collaborative perception framework that incorporates communication optimization and a judger mechanism to facilitate CAV data fusion. Our approach begins with optimizing the connectivity of vehicles while considering communication constraints. We then train a learnable encoder to dynamically adjust the compression ratio based on the channel state information (CSI). Subsequently, we devise a judger mechanism to filter the detrimental image data reconstructed by adaptive decoders. We evaluate the effectiveness of our proposed algorithm on the OpenCOOD platform. Our results demonstrate a substantial reduction in communication costs by 23.10\% compared to the non-judger scheme. Additionally, we achieve a significant improvement on the average precision of Intersection over Union (AP@IoU) by 7.15\% compared with state-of-the-art schemes.

Via

Access Paper or Ask Questions

VLM-Eval: A General Evaluation on Video Large Language Models

Nov 20, 2023

Shuailin Li, Yuang Zhang, Yucheng Zhao, Qiuyue Wang, Fan Jia, Yingfei Liu, Tiancai Wang

Figure 1 for VLM-Eval: A General Evaluation on Video Large Language Models

Figure 2 for VLM-Eval: A General Evaluation on Video Large Language Models

Figure 3 for VLM-Eval: A General Evaluation on Video Large Language Models

Figure 4 for VLM-Eval: A General Evaluation on Video Large Language Models

Abstract:Despite the rapid development of video Large Language Models (LLMs), a comprehensive evaluation is still absent. In this paper, we introduce a unified evaluation that encompasses multiple video tasks, including captioning, question and answering, retrieval, and action recognition. In addition to conventional metrics, we showcase how GPT-based evaluation can match human-like performance in assessing response quality across multiple aspects. We propose a simple baseline: Video-LLaVA, which uses a single linear projection and outperforms existing video LLMs. Finally, we evaluate video LLMs beyond academic datasets, which show encouraging recognition and reasoning capabilities in driving scenarios with only hundreds of video-instruction pairs for fine-tuning. We hope our work can serve as a unified evaluation for video LLMs, and help expand more practical scenarios. The evaluation code will be available soon.

Via

Access Paper or Ask Questions

OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Jul 18, 2023

Dongming Wu, Tiancai Wang, Yuang Zhang, Xiangyu Zhang, Jianbing Shen

Figure 1 for OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Figure 2 for OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Figure 3 for OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Figure 4 for OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation

Abstract:Referring video object segmentation (RVOS) aims at segmenting an object in a video following human instruction. Current state-of-the-art methods fall into an offline pattern, in which each clip independently interacts with text embedding for cross-modal understanding. They usually present that the offline pattern is necessary for RVOS, yet model limited temporal association within each clip. In this work, we break up the previous offline belief and propose a simple yet effective online model using explicit query propagation, named OnlineRefer. Specifically, our approach leverages target cues that gather semantic information and position prior to improve the accuracy and ease of referring predictions for the current frame. Furthermore, we generalize our online model into a semi-online framework to be compatible with video-based backbones. To show the effectiveness of our method, we evaluate it on four benchmarks, \ie, Refer-Youtube-VOS, Refer-DAVIS17, A2D-Sentences, and JHMDB-Sentences. Without bells and whistles, our OnlineRefer with a Swin-L backbone achieves 63.5 J&F and 64.8 J&F on Refer-Youtube-VOS and Refer-DAVIS17, outperforming all other offline methods.

* Accepted by ICCV2023. The code is at https://github.com/wudongming97/OnlineRefer

Via

Access Paper or Ask Questions