Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiyong Xu

Refer to the report for detailed contributions

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Dec 03, 2024

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang(+42 more)

Figure 1 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 2 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 3 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 4 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Abstract:Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.

Via

Access Paper or Ask Questions

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Sep 17, 2024

Jianbo Ma, Chuanming Tang, Fei Wu, Can Zhao, Jianlin Zhang, Zhiyong Xu

Figure 1 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Figure 2 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Figure 3 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Figure 4 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Abstract:Multiple object tracking (MOT) in Unmanned Aerial Vehicle (UAV) videos is important for diverse applications in computer vision. Current MOT trackers rely on accurate object detection results and precise matching of target reidentification (ReID). These methods focus on optimizing target spatial attributes while overlooking temporal cues in modelling object relationships, especially for challenging tracking conditions such as object deformation and blurring, etc. To address the above-mentioned issues, we propose a novel Spatio-Temporal Cohesion Multiple Object Tracking framework (STCMOT), which utilizes historical embedding features to model the representation of ReID and detection features in a sequential order. Concretely, a temporal embedding boosting module is introduced to enhance the discriminability of individual embedding based on adjacent frame cooperation. While the trajectory embedding is then propagated by a temporal detection refinement module to mine salient target locations in the temporal field. Extensive experiments on the VisDrone2019 and UAVDT datasets demonstrate our STCMOT sets a new state-of-the-art performance in MOTA and IDF1 metrics. The source codes are released at https://github.com/ydhcg-BoBo/STCMOT.

Via

Access Paper or Ask Questions

Performance Analysis of Photon-Limited Free-Space Optical Communications with Practical Photon-Counting Receivers

Aug 24, 2024

Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

Abstract:The non-perfect factors of practical photon-counting receiver are recognized as a significant challenge for long-distance photon-limited free-space optical (FSO) communication systems. This paper presents a comprehensive analytical framework for modeling the statistical properties of time-gated single-photon avalanche diode (TG-SPAD) based photon-counting receivers in presence of dead time, non-photon-number-resolving and afterpulsing effect. Drawing upon the non-Markovian characteristic of afterpulsing effect, we formulate a closed-form approximation for the probability mass function (PMF) of photon counts, when high-order pulse amplitude modulation (PAM) is used. Unlike the photon counts from a perfect photon-counting receiver, which adhere to a Poisson arrival process, the photon counts from a practical TG-SPAD based receiver are instead approximated by a binomial distribution. Additionally, by employing the maximum likelihood (ML) criterion, we derive a refined closed-form formula for determining the threshold in high-order PAM, thereby facilitating the development of an analytical model for the symbol error rate (SER). Utilizing this analytical SER model, the system performance is investigated. The numerical results underscore the crucial need to suppress background radiation below the tolerated threshold and to maintain a sufficient number of gates in order to achieve a target SER.

Via

Access Paper or Ask Questions

A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Aug 20, 2024

Chen Wang, Zhiyong Xu, Jingyuan Wang, Jianhua Li, Weifeng Mou, Huatao Zhu, Jiyong Zhao, Yang Su, Yimin Wang, Ailin Qi

Figure 1 for A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Figure 2 for A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Figure 3 for A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Figure 4 for A Novel Signal Detection Method for Photon-Counting Communications with Nonlinear Distortion Effects

Abstract:This paper proposes a method for estimating and detecting optical signals in practical photon-counting receivers. There are two important aspects of non-perfect photon-counting receivers, namely, (i) dead time which results in blocking loss, and (ii) non-photon-number-resolving, which leads to counting loss during the gate-ON interval. These factors introduce nonlinear distortion to the detected photon counts. The detected photon counts depend not only on the optical intensity but also on the signal waveform, and obey a Poisson binomial process. Using the discrete Fourier transform characteristic function (DFT-CF) method, we derive the probability mass function (PMF) of the detected photon counts. Furthermore, unlike conventional methods that assume an ideal rectangle wave, we propose a novel signal estimation and decision method applicable to arbitrary waveform. We demonstrate that the proposed method achieves superior error performance compared to conventional methods. The proposed algorithm has the potential to become an essential signal processing tool for photon-counting receivers.

Via

Access Paper or Ask Questions

RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Mar 25, 2024

Ziheng Deng, Hua Chen, Haibo Hu, Zhiyong Xu, Tianling Lyu, Yan Xi, Yang Chen, Jun Zhao

Figure 1 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 2 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 3 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Figure 4 for RSTAR: Rotational Streak Artifact Reduction in 4D CBCT using Separable and Circular Convolutions

Abstract:Four-dimensional cone-beam computed tomography (4D CBCT) provides respiration-resolved images and can be used for image-guided radiation therapy. However, the ability to reveal respiratory motion comes at the cost of image artifacts. As raw projection data are sorted into multiple respiratory phases, there is a limited number of cone-beam projections available for image reconstruction. Consequently, the 4D CBCT images are covered by severe streak artifacts. Although several deep learning-based methods have been proposed to address this issue, most algorithms employ ordinary network models, neglecting the intrinsic structural prior within 4D CBCT images. In this paper, we first explore the origin and appearance of streak artifacts in 4D CBCT images.Specifically, we find that streak artifacts exhibit a periodic rotational motion along with the patient's respiration. This unique motion pattern inspires us to distinguish the artifacts from the desired anatomical structures in the spatiotemporal domain. Thereafter, we propose a spatiotemporal neural network named RSTAR-Net with separable and circular convolutions for Rotational Streak Artifact Reduction. The specially designed model effectively encodes dynamic image features, facilitating the recovery of 4D CBCT images. Moreover, RSTAR-Net is also lightweight and computationally efficient. Extensive experiments substantiate the effectiveness of our proposed method, and RSTAR-Net shows superior performance to comparison methods.

Via

Access Paper or Ask Questions

Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Sep 06, 2021

Xingjian He, Weining Wang, Zhiyong Xu, Hao Wang, Jie Jiang, Jing Liu

Figure 1 for Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Figure 2 for Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Figure 3 for Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Figure 4 for Exploiting Spatial-Temporal Semantic Consistency for Video Scene Parsing

Abstract:Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction. In this paper, we propose a Spatial-Temporal Semantic Consistency method to capture class-exclusive context information. Specifically, we design a spatial-temporal consistency loss to constrain the semantic consistency in spatial and temporal dimensions. In addition, we adopt an pseudo-labeling strategy to enrich the training dataset. We obtain the scores of 59.84% and 58.85% mIoU on development (test part 1) and testing set of VSPW, respectively. And our method wins the 1st place on VSPW challenge at ICCV2021.

* 1st Place technical report for "The 1st Video Scene Parsing in the Wild Challenge" at ICCV2021

Via

Access Paper or Ask Questions

U-net super-neural segmentation and similarity calculation to realize vegetation change assessment in satellite imagery

Sep 10, 2019

Chunxue Wu, Bobo Ju, Naixue Xiong, Guisong Yang, Yan Wu, Hongming Yang, Jiaying Huang, Zhiyong Xu

Figure 1 for U-net super-neural segmentation and similarity calculation to realize vegetation change assessment in satellite imagery

Figure 2 for U-net super-neural segmentation and similarity calculation to realize vegetation change assessment in satellite imagery

Figure 3 for U-net super-neural segmentation and similarity calculation to realize vegetation change assessment in satellite imagery

Figure 4 for U-net super-neural segmentation and similarity calculation to realize vegetation change assessment in satellite imagery

Abstract:Vegetation is the natural linkage connecting soil, atmosphere and water. It can represent the change of land cover to a certain extent and serve as an indicator for global change research. Methods for measuring coverage can be divided into two types: surface measurement and remote sensing. Because vegetation cover has significant spatial and temporal differentiation characteristics, remote sensing has become an important technical means to estimate vegetation coverage. This paper firstly uses U-net to perform remote sensing image semantic segmentation training, then uses the result of semantic segmentation, and then uses the integral progressive method to calculate the forestland change rate, and finally realizes automated valuation of woodland change rate.

Via

Access Paper or Ask Questions