Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Ge

ROI-Guided Point Cloud Geometry Compression Towards Human and Machine Vision

Apr 19, 2025

Xie Liang, Gao Wei, Zhenghui Ming, Li Ge

Abstract:Point cloud data is pivotal in applications like autonomous driving, virtual reality, and robotics. However, its substantial volume poses significant challenges in storage and transmission. In order to obtain a high compression ratio, crucial semantic details usually confront severe damage, leading to difficulties in guaranteeing the accuracy of downstream tasks. To tackle this problem, we are the first to introduce a novel Region of Interest (ROI)-guided Point Cloud Geometry Compression (RPCGC) method for human and machine vision. Our framework employs a dual-branch parallel structure, where the base layer encodes and decodes a simplified version of the point cloud, and the enhancement layer refines this by focusing on geometry details. Furthermore, the residual information of the enhancement layer undergoes refinement through an ROI prediction network. This network generates mask information, which is then incorporated into the residuals, serving as a strong supervision signal. Additionally, we intricately apply these mask details in the Rate-Distortion (RD) optimization process, with each point weighted in the distortion calculation. Our loss function includes RD loss and detection loss to better guide point cloud encoding for the machine. Experiment results demonstrate that RPCGC achieves exceptional compression performance and better detection accuracy (10% gain) than some learning-based compression methods at high bitrates in ScanNet and SUN RGB-D datasets.

* ACM International Conference on Multimedia 2024
* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Apr 08, 2025

Yi Peng, Chris, Xiaokun Wang, Yichen Wei, Jiangbo Pei, Weijie Qiu, Ai Jian, Yunzhuo Hao, Jiachun Pan, Tianyidan Xie(+5 more)

Abstract:We introduce Skywork R1V, a multimodal reasoning model extending the an R1-series Large language models (LLM) to visual modalities via an efficient multimodal transfer method. Leveraging a lightweight visual projector, Skywork R1V facilitates seamless multimodal adaptation without necessitating retraining of either the foundational language model or the vision encoder. To strengthen visual-text alignment, we propose a hybrid optimization strategy that combines Iterative Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO), significantly enhancing cross-modal integration efficiency. Additionally, we introduce an adaptive-length Chain-of-Thought distillation approach for reasoning data generation. This approach dynamically optimizes reasoning chain lengths, thereby enhancing inference efficiency and preventing excessive reasoning overthinking. Empirical evaluations demonstrate that Skywork R1V, with only 38B parameters, delivers competitive performance, achieving a score of 69.0 on the MMMU benchmark and 67.5 on MathVista. Meanwhile, it maintains robust textual reasoning performance, evidenced by impressive scores of 72.0 on AIME and 94.0 on MATH500. The Skywork R1V model weights have been publicly released to promote openness and reproducibility.

Via

Access Paper or Ask Questions

Exact Tensor Completion Powered by Arbitrary Linear Transforms

Feb 02, 2024

Li Ge, Xue Jiang, Lin Chen

Abstract:In this work, a tensor completion problem is studied, which aims to perfectly recover the tensor from partial observations. Existing theoretical guarantee requires the involved transform to be orthogonal, which hinders its applications. In this paper, jumping out of the constraints of isotropy or self-adjointness, the theoretical guarantee of exact tensor completion with arbitrary linear transforms is established. To that end, we define a new tensor-tensor product, which leads us to a new definition of the tensor nuclear norm. Equipped with these tools, an efficient algorithm based on alternating direction of multipliers is designed to solve the transformed tensor completion program and the theoretical bound is obtained. Our model and proof greatly enhance the flexibility of tensor completion and extensive experiments validate the superiority of the proposed method.

Via

Access Paper or Ask Questions

Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

Sep 06, 2022

Li Ge, Xue Jiang, Lin Chen, Qibo Qin, Xingzhao Liu

Figure 1 for Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

Figure 2 for Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

Figure 3 for Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

Figure 4 for Wideband Channel Estimation for mmWave MIMO Systems with Beam Squint

Abstract:With the scale of antenna arrays and the bandwidth increasing, many existing narrowband channel estimation methods ignoring the effect of beam squint may face severe performance degradation in wideband millimeter-wave (mmWave) communication systems. In this letter, a wideband Newtonized orthogonal matching pursuit (wNOMP) algorithm has been proposed to perform channel estimation. The proposed method based on the minimum mean square error (MMSE) criterion is optimal for Gaussian noise. Considering real communication systems, it is common that the noise follows a non-Gaussian distribution. Accordingly we extend the wideband channel estimation method via the minimum $\ell_p$-norm criterion which enhances the robustness against the non-Gaussian noise. Simulations have been conducted to validate the superiority of the proposed method over other representative methods.

Via

Access Paper or Ask Questions

Discriminative-Region Attention and Orthogonal-View Generation Model for Vehicle Re-Identification

Apr 28, 2022

Huadong Li, Yuefeng Wang, Ying Wei, Lin Wang, Li Ge

Figure 1 for Discriminative-Region Attention and Orthogonal-View Generation Model for Vehicle Re-Identification

Figure 2 for Discriminative-Region Attention and Orthogonal-View Generation Model for Vehicle Re-Identification

Figure 3 for Discriminative-Region Attention and Orthogonal-View Generation Model for Vehicle Re-Identification

Figure 4 for Discriminative-Region Attention and Orthogonal-View Generation Model for Vehicle Re-Identification

Abstract:Vehicle re-identification (Re-ID) is urgently demanded to alleviate thepressure caused by the increasingly onerous task of urban traffic management. Multiple challenges hamper the applications of vision-based vehicle Re-ID methods: (1) The appearances of different vehicles of the same brand/model are often similar; However, (2) the appearances of the same vehicle differ significantly from different viewpoints. Previous methods mainly use manually annotated multi-attribute datasets to assist the network in getting detailed cues and in inferencing multi-view to improve the vehicle Re-ID performance. However, finely labeled vehicle datasets are usually unattainable in real application scenarios. Hence, we propose a Discriminative-Region Attention and Orthogonal-View Generation (DRA-OVG) model, which only requires identity (ID) labels to conquer the multiple challenges of vehicle Re-ID.The proposed DRA model can automatically extract the discriminative region features, which can distinguish similar vehicles. And the OVG model can generate multi-view features based on the input view features to reduce the impact of viewpoint mismatches. Finally, the distance between vehicle appearances is presented by the discriminative region features and multi-view features together. Therefore, the significance of pairwise distance measure between vehicles is enhanced in acomplete feature space. Extensive experiments substantiate the effectiveness of each proposed ingredient, and experimental results indicate that our approach achieves remarkable improvements over the state- of-the-art vehicle Re-ID methods on VehicleID and VeRi-776 datasets.

* 28pages,12 figures

Via

Access Paper or Ask Questions