Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ping Lu

Crowd Detection Using Very-Fine-Resolution Satellite Imagery

Apr 28, 2025

Tong Xiao, Qunming Wang, Ping Lu, Tenghai Huang, Xiaohua Tong, Peter M. Atkinson

Abstract:Accurate crowd detection (CD) is critical for public safety and historical pattern analysis, yet existing methods relying on ground and aerial imagery suffer from limited spatio-temporal coverage. The development of very-fine-resolution (VFR) satellite sensor imagery (e.g., ~0.3 m spatial resolution) provides unprecedented opportunities for large-scale crowd activity analysis, but it has never been considered for this task. To address this gap, we proposed CrowdSat-Net, a novel point-based convolutional neural network, which features two innovative components: Dual-Context Progressive Attention Network (DCPAN) to improve feature representation of individuals by aggregating scene context and local individual characteristics, and High-Frequency Guided Deformable Upsampler (HFGDU) that recovers high-frequency information during upsampling through frequency-domain guided deformable convolutions. To validate the effectiveness of CrowdSat-Net, we developed CrowdSat, the first VFR satellite imagery dataset designed specifically for CD tasks, comprising over 120k manually labeled individuals from multi-source satellite platforms (Beijing-3N, Jilin-1 Gaofen-04A and Google Earth) across China. In the experiments, CrowdSat-Net was compared with five state-of-the-art point-based CD methods (originally designed for ground or aerial imagery) using CrowdSat and achieved the largest F1-score of 66.12% and Precision of 73.23%, surpassing the second-best method by 1.71% and 2.42%, respectively. Moreover, extensive ablation experiments validated the importance of the DCPAN and HFGDU modules. Furthermore, cross-regional evaluation further demonstrated the spatial generalizability of CrowdSat-Net. This research advances CD capability by providing both a newly developed network architecture for CD and a pioneering benchmark dataset to facilitate future CD development.

* 17 pages, 12 figures, 5 tables

Via

Access Paper or Ask Questions

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Jul 16, 2024

Jinrui Zhang, Teng Wang, Haigang Zhang, Ping Lu, Feng Zheng

Figure 1 for Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Figure 2 for Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Figure 3 for Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Figure 4 for Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Abstract:Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. However, they remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. While various mitigation strategies have been proposed, they often neglect a key contributor to hallucinations: lack of fine-grained reasoning supervision during training. Without intermediate reasoning steps, models may establish superficial shortcuts between instructions and responses, failing to internalize the inherent reasoning logic. To address this challenge, we propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning. Unlike previous methods that learning from responses only, our approach entails the model predicting rationales justifying why responses are correct or incorrect. This fosters a deeper engagement with the fine-grained reasoning underlying each response, thus enhancing the model's reasoning proficiency. To facilitate this approach, we propose REVERIE, the first large-scale instruction-tuning dataset with ReflEctiVE RatIonalE annotations. REVERIE comprises 115k machine-generated reasoning instructions, each meticulously annotated with a corresponding pair of correct and confusing responses, alongside comprehensive rationales elucidating the justification behind the correctness or erroneousness of each response. Experimental results on multiple LVLM benchmarks reveal that reflective instruction tuning with the REVERIE dataset yields noticeable performance gain over the baseline model, demonstrating the effectiveness of reflecting from the rationales. Project page is at https://zjr2000.github.io/projects/reverie.

* To appear at ECCV2024

Via

Access Paper or Ask Questions

A Brief Review of Hypernetworks in Deep Learning

Jun 12, 2023

Vinod Kumar Chauhan, Jiandong Zhou, Ping Lu, Soheila Molaei, David A. Clifton

Abstract:Hypernetworks, or hypernets in short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility, adaptability, faster training, information sharing, and model compression etc. Hypernets have shown promising results in a variety of deep learning problems, including continual learning, causal inference, transfer learning, weight pruning, uncertainty quantification, zero-shot learning, natural language processing, and reinforcement learning etc. Despite their success across different problem settings, currently, there is no review available to inform the researchers about the developments and help in utilizing hypernets. To fill this gap, we review the progress in hypernets. We present an illustrative example to train deep neural networks using hypernets and propose to categorize hypernets on five criteria that affect the design of hypernets as inputs, outputs, variability of inputs and outputs, and architecture of hypernets. We also review applications of hypernets across different deep learning problem settings. Finally, we discuss the challenges and future directions that remain under-explored in the field of hypernets. We believe that hypernetworks have the potential to revolutionize the field of deep learning. They offer a new way to design and train neural networks, and they have the potential to improve the performance of deep learning models on a variety of tasks. Through this review, we aim to inspire further advancements in deep learning through hypernetworks.

* 2 figures and 2 tables (under review)

Via

Access Paper or Ask Questions

Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Mar 12, 2022

Yujie Wang, Chenhao Qi, Ping Li, Zhaohua Lu, Ping Lu

Figure 1 for Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Figure 2 for Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Figure 3 for Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Figure 4 for Channel Estimation for Wideband MmWave MIMO OFDM System Exploiting Block Sparsity

Abstract:In this letter, we investigate time-domain channel estimation for wideband millimeter wave (mmWave) MIMO OFDM system. By transmitting frequency-domain pilot symbols as well as different beamforming vectors, we observe that the time-domain mmWave MIMO channels exhibit channel delay sparsity and especially block sparsity among different spatial directions. Then we propose a time-domain channel estimation exploiting block sparsity (TDCEBS) scheme, which always aims at finding the best nonzero block achieving the largest projection of the residue at each iterations. In particular, we evaluate the system performance using the QuaDRiGa which is recommended by 5G New Radio to generate wideband mmWave MIMO channels. The effectiveness of the proposed TDCEBS scheme is verified by the simulation results, as the proposed scheme outperforms the existing schemes.

Via

Access Paper or Ask Questions

Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Oct 28, 2019

Xing Zhao, Ping Lu, Yanyan Zhang, Jianxiong Chen, Xiaoyang Li

Figure 1 for Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Figure 2 for Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Figure 3 for Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Figure 4 for Attenuating Random Noise in Seismic Data by a Deep Learning Approach

Abstract:In the geophysical field, seismic noise attenuation has been considered as a critical and long-standing problem, especially for the pre-stack data processing. Here, we propose a model to leverage the deep-learning model for this task. Rather than directly applying an existing de-noising model from ordinary images to the seismic data, we have designed a particular deep-learning model, based on residual neural networks. It is named as N2N-Seismic, which has a strong ability to recover the seismic signals back to intact condition with the preservation of primary signals. The proposed model, achieving with great success in attenuating noise, has been tested on two different seismic datasets. Several metrics show that our method outperforms conventional approaches in terms of Signal-to-Noise-Ratio, Mean-Squared-Error, Phase Spectrum, etc. Moreover, robust tests in terms of effectively removing random noise from any dataset with strong and weak noises have been extensively scrutinized in making sure that the proposed model is able to maintain a good level of adaptation while dealing with large variations of noise characteristics and intensities.

* 33 pages, 11 figures

Via

Access Paper or Ask Questions

Enhanced Seismic Imaging with Predictive Neural Networks for Geophysics

Aug 11, 2019

Ping Lu, Yanyan Zhang, Jianxiong Chen, Yuan Xiao, George Zhao

Figure 1 for Enhanced Seismic Imaging with Predictive Neural Networks for Geophysics

Abstract:We propose a predictive neural network architecture that can be utilized to update reference velocity models as inputs to full waveform inversion. Deep learning models are explored to augment velocity model building workflows during 3D seismic volume reprocessing in salt-prone environments. Specifically, a neural network architecture, with 3D convolutional, de-convolutional layers, and 3D max-pooling, is designed to take standard amplitude 3D seismic volumes as an input. Enhanced data augmentations through generative adversarial networks and a weighted loss function enable the network to train with few sparsely annotated slices. Batch normalization is also applied for faster convergence. Moreover, a 3D probability cube for salt bodies is generated through ensembles of predictions from multiple models in order to reduce variance. Velocity models inferred from the proposed networks provide opportunities for FWI forward models to converge faster with an initial condition closer to the true model. In each iteration step, the probability cubes of salt bodies inferred from the proposed networks can be used as a regularization term in FWI forward modelling, which may result in an improved velocity model estimation while the output of seismic migration can be utilized as an input of the 3D neural network for subsequent iterations.

Via

Access Paper or Ask Questions

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Jan 02, 2019

Wei Chen, Jincai Chen, Fuhao Zou, Yuan-Fang Li, Ping Lu, Qiang Wang, Wei Zhao

Figure 1 for Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Figure 2 for Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Figure 3 for Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Figure 4 for Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Abstract:Billion-scale high-dimensional approximate nearest neighbour (ANN) search has become an important problem for searching similar objects among the vast amount of images and videos available online. The existing ANN methods are usually characterized by their specific indexing structures, including the inverted index and the inverted multi-index. The inverted index structure is amenable to GPU-based implementations, and the state-of-the-art systems such as Faiss are able to exploit the massive parallelism offered by GPUs. However, the inverted index requires high memory overhead to index the dataset effectively. The inverted multi-index is difficult to implement for GPUs, and also ineffective in dealing with database with different data distributions. In this paper we propose a novel hierarchical inverted index structure generated by vector and line quantization methods. Our quantization method improves both search efficiency and accuracy, while maintaining comparable memory consumption. This is achieved by reducing search space and increasing the number of indexed regions. We introduce a new ANN search system, VLQ-ADC, that is based on the proposed inverted index, and perform extensive evaluation on two public billion-scale benchmark datasets SIFT1B and DEEP1B. Our evaluation shows that VLQ-ADC significantly outperforms the state-of-the-art GPU- and CPU-based systems in terms of both accuracy and search speed.

Via

Access Paper or Ask Questions

Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Dec 15, 2017

Shihao Zhang, Weiyao Lin, Ping Lu, Weihua Li, Shuo Deng

Figure 1 for Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Figure 2 for Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Figure 3 for Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Figure 4 for Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition

Abstract:Object detection is an important yet challenging task in video understanding & analysis, where one major challenge lies in the proper balance between two contradictive factors: detection accuracy and detection speed. In this paper, we propose a new adaptive patch-of-interest composition approach for boosting both the accuracy and speed for object detection. The proposed approach first extracts patches in a video frame which have the potential to include objects-of-interest. Then, an adaptive composition process is introduced to compose the extracted patches into an optimal number of sub-frames for object detection. With this process, we are able to maintain the resolution of the original frame during object detection (for guaranteeing the accuracy), while minimizing the number of inputs in detection (for boosting the speed). Experimental results on various datasets demonstrate the effectiveness of the proposed approach.

* The project page for this paper is available at http://min.sjtu.edu.cn/lwydemo/Dete/demo/detection.html

Via

Access Paper or Ask Questions