Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sicheng Wang

A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery

Mar 29, 2025

Pengyu Chen, Sicheng Wang, Cuizhen Wang, Senrong Wang, Beiao Huang, Lu Huang, Zhe Zang

Abstract:Accurate rooftop detection from historical aerial imagery is vital for examining long-term urban development and human settlement patterns. However, black-and-white analog photographs pose significant challenges for modern object detection frameworks due to their limited spatial resolution, lack of color information, and archival degradation. To address these limitations, this study introduces a two-stage image enhancement pipeline based on Generative Adversarial Networks (GANs): image colorization using DeOldify, followed by super-resolution enhancement with Real-ESRGAN. The enhanced images were then used to train and evaluate rooftop detection models, including Faster R-CNN, DETReg, and YOLOv11n. Results show that combining colorization with super-resolution substantially improves detection performance, with YOLOv11n achieving a mean Average Precision (mAP) exceeding 85%. This reflects an improvement of approximately 40% over original black-and-white images and 20% over images enhanced through colorization alone. The proposed method effectively bridges the gap between archival imagery and contemporary deep learning techniques, enabling more reliable extraction of building footprints from historical aerial photographs.

Via

Access Paper or Ask Questions

RoboBERT: An End-to-end Multimodal Robotic Manipulation Model

Feb 11, 2025

Sicheng Wang, Jianhua Shan, Jianwei Zhang, Haozhang Gao, Hailiang Han, Yipeng Chen, Kang Wei, Chengkun Zhang, Kairos Wong, Jie Zhao(+2 more)

Abstract:Embodied intelligence integrates multiple modalities, enabling agents to understand images, language, and actions simultaneously. However, existing models always depend on additional datasets or extensive pre-training to maximize performance improvements, consuming abundant training time and expensive hardware cost. To tackle this issue, we present RoboBERT, a novel end-to-end robotic manipulation model integrated with a unique training strategy. This model utilizes a CNN-based diffusion policy, enhancing and stabilizing the effectiveness of this model by separating training processes for different modalities. It also underscores the importance of data augmentation, verifying various techniques to significantly boost performance. Unlike models that depend on extra data or large foundation models, RoboBERT achieves a highly competitive success rate while using only language-labeled expert demonstrations and maintaining a relatively smaller model size. Specifically, RoboBERT achieves an average length of 4.52 on the CALVIN benchmark for \(ABCD \rightarrow D\) task, setting a new state-of-the-art (SOTA) record. Furthermore, when tested on a real robot, the model demonstrates superior performance, achieving a higher success rate than other methods trained with the same data. We propose that these concepts and methodologies of RoboBERT demonstrate extensive versatility and compatibility, contributing significantly to the development of lightweight multimodal robotic models. The code can be accessed on https://github.com/PeterWangsicheng/RoboBERT

Via

Access Paper or Ask Questions

Physics-Grounded Differentiable Simulation for Soft Growing Robots

Jan 29, 2025

Lucas Chen, Yitian Gao, Sicheng Wang, Francesco Fuentes, Laura H. Blumenschein, Zachary Kingston

Figure 1 for Physics-Grounded Differentiable Simulation for Soft Growing Robots

Figure 2 for Physics-Grounded Differentiable Simulation for Soft Growing Robots

Figure 3 for Physics-Grounded Differentiable Simulation for Soft Growing Robots

Figure 4 for Physics-Grounded Differentiable Simulation for Soft Growing Robots

Abstract:Soft-growing robots (i.e., vine robots) are a promising class of soft robots that allow for navigation and growth in tightly confined environments. However, these robots remain challenging to model and control due to the complex interplay of the inflated structure and inextensible materials, which leads to obstacles for autonomous operation and design optimization. Although there exist simulators for these systems that have achieved qualitative and quantitative success in matching high-level behavior, they still often fail to capture realistic vine robot shapes using simplified parameter models and have difficulties in high-throughput simulation necessary for planning and parameter optimization. We propose a differentiable simulator for these systems, enabling the use of the simulator "in-the-loop" of gradient-based optimization approaches to address the issues listed above. With the more complex parameter fitting made possible by this approach, we experimentally validate and integrate a closed-form nonlinear stiffness model for thin-walled inflated tubes based on a first-principles approach to local material wrinkling. Our simulator also takes advantage of data-parallel operations by leveraging existing differentiable computation frameworks, allowing multiple simultaneous rollouts. We demonstrate the feasibility of using a physics-grounded nonlinear stiffness model within our simulator, and how it can be an effective tool in sim-to-real transfer. We provide our implementation open source.

* 8 pages, 7 figures. IEEE-RAS International Conference on Soft Robotics (RoboSoft) 2025

Via

Access Paper or Ask Questions

Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint

Oct 16, 2024

Sicheng Wang, Eugenio Frias-Miranda, Antonio Alvarez Valdivia, Laura H. Blumenschein

Figure 1 for Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint

Figure 2 for Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint

Figure 3 for Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint

Figure 4 for Anisotropic Stiffness and Programmable Actuation for Soft Robots Enabled by an Inflated Rotational Joint

Abstract:Soft robots are known for their ability to perform tasks with great adaptability, enabled by their distributed, non-uniform stiffness and actuation. Bending is the most fundamental motion for soft robot design, but creating robust, and easy-to-fabricate soft bending joint with tunable properties remains an active problem of research. In this work, we demonstrate an inflatable actuation module for soft robots with a defined bending plane enabled by forced partial wrinkling. This lowers the structural stiffness in the bending direction, with the final stiffness easily designed by the ratio of wrinkled and unwrinkled regions. We present models and experimental characterization showing the stiffness properties of the actuation module, as well as its ability to maintain the kinematic constraint over a large range of loading conditions. We demonstrate the potential for complex actuation in a soft continuum robot and for decoupling actuation force and efficiency from load capacity. The module provides a novel method for embedding intelligent actuation into soft pneumatic robots.

Via

Access Paper or Ask Questions

How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?

Aug 31, 2024

Sicheng Wang, Che Liu, Rossella Arcucci

Abstract:Recent advancements in medical vision-language pre-training (MedVLP) have significantly enhanced zero-shot medical vision tasks such as image classification by leveraging large-scale medical image-text pair pre-training. However, the performance of these tasks can be heavily influenced by the variability in textual prompts describing the categories, necessitating robustness in MedVLP models to diverse prompt styles. Yet, this sensitivity remains underexplored. In this work, we are the first to systematically assess the sensitivity of three widely-used MedVLP methods to a variety of prompts across 15 different diseases. To achieve this, we designed six unique prompt styles to mirror real clinical scenarios, which were subsequently ranked by interpretability. Our findings indicate that all MedVLP models evaluated show unstable performance across different prompt styles, suggesting a lack of robustness. Additionally, the models' performance varied with increasing prompt interpretability, revealing difficulties in comprehending complex medical concepts. This study underscores the need for further development in MedVLP methodologies to enhance their robustness to diverse zero-shot prompts.

Via

Access Paper or Ask Questions

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Dec 16, 2023

Mingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Yaqiao Deng, Zhen Huang, Mahesh Krishnamoorthy

Figure 1 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 2 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 3 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 4 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Abstract:With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other small home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on small wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.

Via

Access Paper or Ask Questions

CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer

Dec 14, 2023

Sicheng Wang, Hao Jiang, Lei Xiang

Abstract:Recent deep multi-view stereo (MVS) methods have widely incorporated transformers into cascade network for high-resolution depth estimation, achieving impressive results. However, existing transformer-based methods are constrained by their computational costs, preventing their extension to finer stages. In this paper, we propose a novel cross-scale transformer (CT) that processes feature representations at different stages without additional computation. Specifically, we introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. This combined strategy enables our network to capture intra-image context information and enhance inter-image feature relationships. Besides, we present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction to further strengthen global and local feature awareness. Meanwhile, we design a feature metric loss (FM Loss) that evaluates the feature bias before and after transformation to reduce the impact of feature mismatch on depth estimation. Extensive experiments on DTU dataset and Tanks and Temples (T\&T) benchmark demonstrate that our method achieves state-of-the-art results. Code is available at https://github.com/wscstrive/CT-MVSNet.

* Accepted at the 30th International Conference on Multimedia Modeling (MMM 2024)

Via

Access Paper or Ask Questions

Understanding Bugs in Multi-Language Deep Learning Frameworks

Mar 05, 2023

Zengyang Li, Sicheng Wang, Wenshuo Wang, Peng Liang, Ran Mo, Bing Li

Figure 1 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 2 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 3 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Figure 4 for Understanding Bugs in Multi-Language Deep Learning Frameworks

Abstract:Deep learning frameworks (DLFs) have been playing an increasingly important role in this intelligence age since they act as a basic infrastructure for an increasingly wide range of AIbased applications. Meanwhile, as multi-programming-language (MPL) software systems, DLFs are inevitably suffering from bugs caused by the use of multiple programming languages (PLs). Hence, it is of paramount significance to understand the bugs (especially the bugs involving multiple PLs, i.e., MPL bugs) of DLFs, which can provide a foundation for preventing, detecting, and resolving bugs in the development of DLFs. To this end, we manually analyzed 1497 bugs in three MPL DLFs, namely MXNet, PyTorch, and TensorFlow. First, we classified bugs in these DLFs into 12 types (e.g., algorithm design bugs and memory bugs) according to their bug labels and characteristics. Second, we further explored the impacts of different bug types on the development of DLFs, and found that deployment bugs and memory bugs negatively impact the development of DLFs in different aspects the most. Third, we found that 28.6%, 31.4%, and 16.0% of bugs in MXNet, PyTorch, and TensorFlow are MPL bugs, respectively; the PL combination of Python and C/C++ is most used in fixing more than 92% MPL bugs in all DLFs. Finally, the code change complexity of MPL bug fixes is significantly greater than that of single-programming-language (SPL) bug fixes in all the three DLFs, while in PyTorch MPL bug fixes have longer open time and greater communication complexity than SPL bug fixes. These results provide insights for bug management in DLFs.

* The 31st IEEE/ACM International Conference on Program Comprehension (ICPC)

Via

Access Paper or Ask Questions

The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Sep 03, 2022

Sicheng Wang, Eugenio Frias Miranda, Laura H. Blumenschein

Figure 1 for The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Figure 2 for The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Figure 3 for The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Figure 4 for The Folded Pneumatic Artificial Muscle (foldPAM): Towards Programmability and Control via End Geometry

Abstract:Soft pneumatic actuators have seen applications in many soft robotic systems, and their pressure-driven nature presents unique challenges and opportunities for controlling their motion. In this work, we present a new concept: designing and controlling pneumatic actuators via end geometry. We demonstrate a novel actuator class, named the folded Pneumatic Artificial Muscle (foldPAM), which features a thin-filmed air pouch that is symmetrically folded on each side. Varying the folded portion of the actuator changes the end constraints and, hence, the force-strain relationships. We investigated this change experimentally by measuring the force-strain relationship of individual foldPAM units with various lengths and amounts of folding. In addition to static-geometry units, an actuated foldPAM device was designed to produce continuous, on-demand adjustment of the end geometry, enabling closed-loop position control while maintaining constant pressure. Experiments with the device indicate that geometry control allows access to different areas on the force-strain plane and that closed-loop geometry control can achieve errors within 0.5% of the actuation range.

* Manuscript submitted to IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Dec 29, 2020

Jianghao Shen, Sicheng Wang, Zhangyang Wang

Figure 1 for Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Figure 2 for Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Figure 3 for Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Abstract:Despite the latest prevailing success of deep neural networks (DNNs), several concerns have been raised against their usage, including the lack of intepretability the gap between DNNs and other well-established machine learning models, and the growingly expensive computational costs. A number of recent works [1], [2], [3] explored the alternative to sequentially stacking decision tree/random forest building blocks in a purely feed-forward way, with no need of back propagation. Since decision trees enjoy inherent reasoning transparency, such deep forest models can also facilitate the understanding of the internaldecision making process. This paper further extends the deep forest idea in several important aspects. Firstly, we employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.Besides enhancing the flexibility, it also enables non-greedy optimization for each tree. Second, we propose an innovative topology learning strategy: every node in the ree now maintains a new learnable hyperparameter indicating the probability that it will be a leaf node. In that way, the tree will jointly optimize both its parameters and the tree topology during training. Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3] , with dramatically reduced model complexity. For example,our model with only 1 layer of 15 trees can perform comparably with the model in [3] with 2 layers of 2000 trees each.

* ICDM Workshops 2018: 399-402
* ICDM workshop 2018

Via

Access Paper or Ask Questions