Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jian Shi

HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Jan 06, 2025

Wentian Qu, Jiahe Li, Jian Cheng, Jian Shi, Chenyu Meng, Cuixia Ma, Hongan Wang, Xiaoming Deng, Yinda Zhang

Figure 1 for HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Figure 2 for HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Figure 3 for HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Figure 4 for HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation

Abstract:Understanding of bimanual hand-object interaction plays an important role in robotics and virtual reality. However, due to significant occlusions between hands and object as well as the high degree-of-freedom motions, it is challenging to collect and annotate a high-quality, large-scale dataset, which prevents further improvement of bimanual hand-object interaction-related baselines. In this work, we propose a new 3D Gaussian Splatting based data augmentation framework for bimanual hand-object interaction, which is capable of augmenting existing dataset to large-scale photorealistic data with various hand-object pose and viewpoints. First, we use mesh-based 3DGS to model objects and hands, and to deal with the rendering blur problem due to multi-resolution input images used, we design a super-resolution module. Second, we extend the single hand grasping pose optimization module for the bimanual hand object to generate various poses of bimanual hand-object interaction, which can significantly expand the pose distribution of the dataset. Third, we conduct an analysis for the impact of different aspects of the proposed data augmentation on the understanding of the bimanual hand-object interaction. We perform our data augmentation on two benchmarks, H2O and Arctic, and verify that our method can improve the performance of the baselines.

* Accepted by AAAI2025

Via

Access Paper or Ask Questions

Amodal Depth Anything: Amodal Depth Estimation in the Wild

Dec 03, 2024

Zhenyu Li, Mykola Lavreniuk, Jian Shi, Shariq Farooq Bhat, Peter Wonka

Abstract:Amodal depth estimation aims to predict the depth of occluded (invisible) parts of objects in a scene. This task addresses the question of whether models can effectively perceive the geometry of occluded regions based on visible cues. Prior methods primarily rely on synthetic datasets and focus on metric depth estimation, limiting their generalization to real-world settings due to domain shifts and scalability challenges. In this paper, we propose a novel formulation of amodal depth estimation in the wild, focusing on relative depth prediction to improve model generalization across diverse natural images. We introduce a new large-scale dataset, Amodal Depth In the Wild (ADIW), created using a scalable pipeline that leverages segmentation datasets and compositing techniques. Depth maps are generated using large pre-trained depth models, and a scale-and-shift alignment strategy is employed to refine and blend depth predictions, ensuring consistency in ground-truth annotations. To tackle the amodal depth task, we present two complementary frameworks: Amodal-DAV2, a deterministic model based on Depth Anything V2, and Amodal-DepthFM, a generative model that integrates conditional flow matching principles. Our proposed frameworks effectively leverage the capabilities of large pre-trained models with minimal modifications to achieve high-quality amodal depth predictions. Experiments validate our design choices, demonstrating the flexibility of our models in generating diverse, plausible depth structures for occluded regions. Our method achieves a 69.5% improvement in accuracy over the previous SoTA on the ADIW dataset.

Via

Access Paper or Ask Questions

SPAC-Net: Rethinking Point Cloud Completion with Structural Prior

Nov 22, 2024

Zizhao Wu, Jian Shi, Xuan Deng, Cheng Zhang, Genfu Yang, Ming Zeng, Yunhai Wang

Abstract:Point cloud completion aims to infer a complete shape from its partial observation. Many approaches utilize a pure encoderdecoder paradigm in which complete shape can be directly predicted by shape priors learned from partial scans, however, these methods suffer from the loss of details inevitably due to the feature abstraction issues. In this paper, we propose a novel framework,termed SPAC-Net, that aims to rethink the completion task under the guidance of a new structural prior, we call it interface. Specifically, our method first investigates Marginal Detector (MAD) module to localize the interface, defined as the intersection between the known observation and the missing parts. Based on the interface, our method predicts the coarse shape by learning the displacement from the points in interface move to their corresponding position in missing parts. Furthermore, we devise an additional Structure Supplement(SSP) module before the upsampling stage to enhance the structural details of the coarse shape, enabling the upsampling module to focus more on the upsampling task. Extensive experiments have been conducted on several challenging benchmarks, and the results demonstrate that our method outperforms existing state-of-the-art approaches.

Via

Access Paper or Ask Questions

StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart

Nov 21, 2024

Jian Shi, Qian Wang, Zhenyu Li, Peter Wonka

Abstract:Generating high-quality stereo videos that mimic human binocular vision requires maintaining consistent depth perception and temporal coherence across frames. While diffusion models have advanced image and video synthesis, generating high-quality stereo videos remains challenging due to the difficulty of maintaining consistent temporal and spatial coherence between left and right views. We introduce \textit{StereoCrafter-Zero}, a novel framework for zero-shot stereo video generation that leverages video diffusion priors without the need for paired training data. Key innovations include a noisy restart strategy to initialize stereo-aware latents and an iterative refinement process that progressively harmonizes the latent space, addressing issues like temporal flickering and view inconsistencies. Comprehensive evaluations, including quantitative metrics and user studies, demonstrate that \textit{StereoCrafter-Zero} produces high-quality stereo videos with improved depth consistency and temporal smoothness, even when depth estimations are imperfect. Our framework is robust and adaptable across various diffusion models, setting a new benchmark for zero-shot stereo video generation and enabling more immersive visual experiences. Our code can be found in~\url{https://github.com/shijianjian/StereoCrafter-Zero}.

Via

Access Paper or Ask Questions

OCMG-Net: Neural Oriented Normal Refinement for Unstructured Point Clouds

Sep 02, 2024

Yingrui Wu, Mingyang Zhao, Weize Quan, Jian Shi, Xiaohong Jia, Dong-Ming Yan

Abstract:We present a robust refinement method for estimating oriented normals from unstructured point clouds. In contrast to previous approaches that either suffer from high computational complexity or fail to achieve desirable accuracy, our novel framework incorporates sign orientation and data augmentation in the feature space to refine the initial oriented normals, striking a balance between efficiency and accuracy. To address the issue of noise-caused direction inconsistency existing in previous approaches, we introduce a new metric called the Chamfer Normal Distance, which faithfully minimizes the estimation error by correcting the annotated normal with the closest point found on the potentially clean point cloud. This metric not only tackles the challenge but also aids in network training and significantly enhances network robustness against noise. Moreover, we propose an innovative dual-parallel architecture that integrates Multi-scale Local Feature Aggregation and Hierarchical Geometric Information Fusion, which enables the network to capture intricate geometric details more effectively and notably reduces ambiguity in scale selection. Extensive experiments demonstrate the superiority and versatility of our method in both unoriented and oriented normal estimation tasks across synthetic and real-world datasets among indoor and outdoor scenarios. The code is available at https://github.com/YingruiWoo/OCMG-Net.git.

* 18 pages, 16 figures

Via

Access Paper or Ask Questions

Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection

Apr 17, 2024

Nawfal Guefrachi, Jian Shi, Hakim Ghazzai, Ahmad Alsharoa

Abstract:The integration of Light Detection and Ranging (LiDAR) and Internet of Things (IoT) technologies offers transformative opportunities for public health informatics in urban safety and pedestrian well-being. This paper proposes a novel framework utilizing these technologies for enhanced 3D object detection and activity classification in urban traffic scenarios. By employing elevated LiDAR, we obtain detailed 3D point cloud data, enabling precise pedestrian activity monitoring. To overcome urban data scarcity, we create a specialized dataset through simulated traffic environments in Blender, facilitating targeted model training. Our approach employs a modified Point Voxel-Region-based Convolutional Neural Network (PV-RCNN) for robust 3D detection and PointNet for classifying pedestrian activities, significantly benefiting urban traffic management and public health by offering insights into pedestrian behavior and promoting safer urban environments. Our dual-model approach not only enhances urban traffic management but also contributes significantly to public health by providing insights into pedestrian behavior and promoting safer urban environment.

Via

Access Paper or Ask Questions

HDA-LVIO: A High-Precision LiDAR-Visual-Inertial Odometry in Urban Environments with Hybrid Data Association

Mar 11, 2024

Jian Shi, Wei Wang, Mingyang Qi, Xin Li, Ye Yan

Abstract:To enhance localization accuracy in urban environments, an innovative LiDAR-Visual-Inertial odometry, named HDA-LVIO, is proposed by employing hybrid data association. The proposed HDA_LVIO system can be divided into two subsystems: the LiDAR-Inertial subsystem (LIS) and the Visual-Inertial subsystem (VIS). In the LIS, the LiDAR pointcloud is utilized to calculate the Iterative Closest Point (ICP) error, serving as the measurement value of Error State Iterated Kalman Filter (ESIKF) to construct the global map. In the VIS, an incremental method is firstly employed to adaptively extract planes from the global map. And the centroids of these planes are projected onto the image to obtain projection points. Then, feature points are extracted from the image and tracked along with projection points using Lucas-Kanade (LK) optical flow. Next, leveraging the vehicle states from previous intervals, sliding window optimization is performed to estimate the depth of feature points. Concurrently, a method based on epipolar geometric constraints is proposed to address tracking failures for feature points, which can improve the accuracy of depth estimation for feature points by ensuring sufficient parallax within the sliding window. Subsequently, the feature points and projection points are hybridly associated to construct reprojection error, serving as the measurement value of ESIKF to estimate vehicle states. Finally, the localization accuracy of the proposed HDA-LVIO is validated using public datasets and data from our equipment. The results demonstrate that the proposed algorithm achieves obviously improvement in localization accuracy compared to various existing algorithms.

Via

Access Paper or Ask Questions

Emulating Complex Synapses Using Interlinked Proton Conductors

Jan 26, 2024

Lifu Zhang, Ji-An Li, Yang Hu, Jie Jiang, Rongjie Lai, Marcus K. Benna, Jian Shi

Abstract:In terms of energy efficiency and computational speed, neuromorphic electronics based on non-volatile memory devices is expected to be one of most promising hardware candidates for future artificial intelligence (AI). However, catastrophic forgetting, networks rapidly overwriting previously learned weights when learning new tasks, remains as a pivotal hurdle in either digital or analog AI chips for unleashing the true power of brain-like computing. To address catastrophic forgetting in the context of online memory storage, a complex synapse model (the Benna-Fusi model) has been proposed recently[1], whose synaptic weight and internal variables evolve following a diffusion dynamics. In this work, by designing a proton transistor with a series of charge-diffusion-controlled storage components, we have experimentally realized the Benna-Fusi artificial complex synapse. The memory consolidation from coupled storage components is revealed by both numerical simulations and experimental observations. Different memory timescales for the complex synapse are engineered by the diffusion length of charge carriers, the capacity and number of coupled storage components. The advantage of the demonstrated complex synapse in both memory capacity and memory consolidation is revealed by neural network simulations of face familiarity detection. Our experimental realization of the complex synapse suggests a promising approach to enhance memory capacity and to enable continual learning.

* 6 figures

Via

Access Paper or Ask Questions

VoxelKP: A Voxel-based Network Architecture for Human Keypoint Estimation in LiDAR Data

Dec 11, 2023

Jian Shi, Peter Wonka

Abstract:We present \textit{VoxelKP}, a novel fully sparse network architecture tailored for human keypoint estimation in LiDAR data. The key challenge is that objects are distributed sparsely in 3D space, while human keypoint detection requires detailed local information wherever humans are present. We propose four novel ideas in this paper. First, we propose sparse selective kernels to capture multi-scale context. Second, we introduce sparse box-attention to focus on learning spatial correlations between keypoints within each human instance. Third, we incorporate a spatial encoding to leverage absolute 3D coordinates when projecting 3D voxels to a 2D grid encoding a bird's eye view. Finally, we propose hybrid feature learning to combine the processing of per-voxel features with sparse convolution. We evaluate our method on the Waymo dataset and achieve an improvement of $27\%$ on the MPJPE metric compared to the state-of-the-art, \textit{HUM3DIL}, trained on the same data, and $12\%$ against the state-of-the-art, \textit{GC-KPL}, pretrained on a $25\times$ larger dataset. To the best of our knowledge, \textit{VoxelKP} is the first single-staged, fully sparse network that is specifically designed for addressing the challenging task of 3D keypoint estimation from LiDAR data, achieving state-of-the-art performances. Our code is available at \url{https://github.com/shijianjian/VoxelKP}.

Via

Access Paper or Ask Questions

AGAD: Adversarial Generative Anomaly Detection

Apr 09, 2023

Jian Shi, Ni Zhang

Abstract:Anomaly detection suffered from the lack of anomalies due to the diversity of abnormalities and the difficulties of obtaining large-scale anomaly data. Semi-supervised anomaly detection methods are often used to solely leverage normal data to detect abnormalities that deviated from the learnt normality distributions. Meanwhile, given the fact that limited anomaly data can be obtained with a minor cost in practice, some researches also investigated anomaly detection methods under supervised scenarios with limited anomaly data. In order to address the lack of abnormal data for robust anomaly detection, we propose Adversarial Generative Anomaly Detection (AGAD), a self-contrast-based anomaly detection paradigm that learns to detect anomalies by generating \textit{contextual adversarial information} from the massive normal examples. Essentially, our method generates pseudo-anomaly data for both supervised and semi-supervised anomaly detection scenarios. Extensive experiments are carried out on multiple benchmark datasets and real-world datasets, the results show significant improvement in both supervised and semi-supervised scenarios. Importantly, our approach is data-efficient that can boost up the detection accuracy with no more than 5% anomalous training data.

Via

Access Paper or Ask Questions