Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoyi Jiang

Thermal Detection of People with Mobility Restrictions for Barrier Reduction at Traffic Lights Controlled Intersections

May 14, 2025

Xiao Ni, Carsten Kuehnel, Xiaoyi Jiang

Abstract:Rapid advances in deep learning for computer vision have driven the adoption of RGB camera-based adaptive traffic light systems to improve traffic safety and pedestrian comfort. However, these systems often overlook the needs of people with mobility restrictions. Moreover, the use of RGB cameras presents significant challenges, including limited detection performance under adverse weather or low-visibility conditions, as well as heightened privacy concerns. To address these issues, we propose a fully automated, thermal detector-based traffic light system that dynamically adjusts signal durations for individuals with walking impairments or mobility burden and triggers the auditory signal for visually impaired individuals, thereby advancing towards barrier-free intersection for all users. To this end, we build the thermal dataset for people with mobility restrictions (TD4PWMR), designed to capture diverse pedestrian scenarios, particularly focusing on individuals with mobility aids or mobility burden under varying environmental conditions, such as different lighting, weather, and crowded urban settings. While thermal imaging offers advantages in terms of privacy and robustness to adverse conditions, it also introduces inherent hurdles for object detection due to its lack of color and fine texture details and generally lower resolution of thermal images. To overcome these limitations, we develop YOLO-Thermal, a novel variant of the YOLO architecture that integrates advanced feature extraction and attention mechanisms for enhanced detection accuracy and robustness in thermal imaging. Experiments demonstrate that the proposed thermal detector outperforms existing detectors, while the proposed traffic light system effectively enhances barrier-free intersection. The source codes and dataset are available at https://github.com/leon2014dresden/YOLO-THERMAL.

Via

Access Paper or Ask Questions

Robustness of Generalized Median Computation for Consensus Learning in Arbitrary Spaces

Mar 07, 2025

Andreas Nienkötter, Sandro Vega-Pons, Xiaoyi Jiang

Abstract:Robustness in terms of outliers is an important topic and has been formally studied for a variety of problems in machine learning and computer vision. Generalized median computation is a special instance of consensus learning and a common approach to finding prototypes. Related research can be found in numerous problem domains with a broad range of applications. So far, however, robustness of generalized median has only been studied in a few specific spaces. To our knowledge, there is no robustness characterization in a general setting, i.e. for arbitrary spaces. We address this open issue in our work. The breakdown point >=0.5 is proved for generalized median with metric distance functions in general. We also study the detailed behavior in case of outliers from different perspectives. In addition, we present robustness results for weighted generalized median computation and non-metric distance functions. Given the importance of robustness, our work contributes to closing a gap in the literature. The presented results have general impact and applicability, e.g. providing deeper understanding of generalized median computation and practical guidance to avoid non-robust computation.

Via

Access Paper or Ask Questions

CVKAN: Complex-Valued Kolmogorov-Arnold Networks

Feb 04, 2025

Matthias Wolff, Florian Eilers, Xiaoyi Jiang

Abstract:In this work we propose CKAN, a complex-valued KAN, to join the intrinsic interpretability of KANs and the advantages of Complex-Valued Neural Networks (CVNNs). We show how to transfer a KAN and the necessary associated mechanisms into the complex domain. To confirm that CKAN meets expectations we conduct experiments on symbolic complex-valued function fitting and physically meaningful formulae as well as on a more realistic dataset from knot theory. Our proposed CKAN is more stable and performs on par or better than real-valued KANs while requiring less parameters and a shallower network architecture, making it more explainable.

Via

Access Paper or Ask Questions

Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Jun 27, 2024

Lennart Bruns, Lucas Lamparter, Milos Galic, Xiaoyi Jiang

Figure 1 for Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Figure 2 for Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Figure 3 for Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Figure 4 for Single Image Estimation of Cell Migration Direction by Deep Circular Regression

Abstract:In this paper we study the problem of estimating the migration direction of cells based on a single image. To the best of our knowledge, there is only one related work that uses a classification CNN for four classes (quadrants). This approach does not allow detailed directional resolution. We solve the single image estimation problem using deep circular regression with special attention to cycle-sensitive methods. On two databases we achieve an average accuracy of $\sim$17 degrees, which is a significant improvement over the previous work.

Via

Access Paper or Ask Questions

DeepCSHAP: Utilizing Shapley Values to Explain Deep Complex-Valued Neural Networks

Mar 13, 2024

Florian Eilers, Xiaoyi Jiang

Abstract:Deep Neural Networks are widely used in academy as well as corporate and public applications, including safety critical applications such as health care and autonomous driving. The ability to explain their output is critical for safety reasons as well as acceptance among applicants. A multitude of methods have been proposed to explain real-valued neural networks. Recently, complex-valued neural networks have emerged as a new class of neural networks dealing with complex-valued input data without the necessity of projecting them onto $\mathbb{R}^2$. This brings up the need to develop explanation algorithms for this kind of neural networks. In this paper we provide these developments. While we focus on adapting the widely used DeepSHAP algorithm to the complex domain, we also present versions of four gradient based explanation methods suitable for use in complex-valued neural networks. We evaluate the explanation quality of all presented algorithms and provide all of them as an open source library adaptable to most recent complex-valued neural network architectures.

* 14 Pages plus 4 Pages Appendix

Via

Access Paper or Ask Questions

A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Feb 23, 2024

Stefan Hegselmann, Shannon Zejiang Shen, Florian Gierse, Monica Agrawal, David Sontag, Xiaoyi Jiang

Figure 1 for A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Figure 2 for A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Figure 3 for A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Figure 4 for A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models

Abstract:Patients often face difficulties in understanding their hospitalizations, while healthcare workers have limited resources to provide explanations. In this work, we investigate the potential of large language models to generate patient summaries based on doctors' notes and study the effect of training data on the faithfulness and quality of the generated summaries. To this end, we develop a rigorous labeling protocol for hallucinations, and have two medical experts annotate 100 real-world summaries and 100 generated summaries. We show that fine-tuning on hallucination-free data effectively reduces hallucinations from 2.60 to 1.55 per summary for Llama 2, while preserving relevant information. Although the effect is still present, it is much smaller for GPT-4 when prompted with five examples (0.70 to 0.40). We also conduct a qualitative evaluation using hallucination-free and improved training data. GPT-4 shows very good results even in the zero-shot setting. We find that common quantitative metrics do not correlate well with faithfulness and quality. Finally, we test GPT-4 for automatic hallucination detection, which yields promising results.

Via

Access Paper or Ask Questions

Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer

Sep 11, 2023

Puxun Tu, Hongfei Ye, Haochen Shi, Jeff Young, Meng Xie, Peiquan Zhao, Ce Zheng, Xiaoyi Jiang, Xiaojun Chen

Abstract:Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist. While existing PCS guidance systems extract valuable information from surgical microscopic videos to enhance intraoperative proficiency, they suffer from non-phasespecific guidance, leading to redundant visual information. In this study, our major contribution is the development of a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase. Leveraging the inherent quasi-standardized nature of PCS procedures, we propose a two-stage surgical microscopic video recognition network. In the first stage, we implement a multi-task learning structure to segment the surgical limbus region and extract limbus region-focused spatial feature for each frame. In the second stage, we propose the long-short spatiotemporal aggregation transformer (LS-SAT) network to model local fine-grained and global temporal relationships, and combine the extracted spatial features to recognize the current surgical phase. Additionally, we collaborate closely with ophthalmologists to design AR visual cues by utilizing techniques such as limbus ellipse fitting and regional restricted normal cross-correlation rotation computation. We evaluated the network on publicly available and in-house datasets, with comparison results demonstrating its superior performance compared to related works. Ablation results further validated the effectiveness of the limbus region-focused spatial feature extractor and the combination of temporal features. Furthermore, the developed system was evaluated in a clinical setup, with results indicating remarkable accuracy and real-time performance. underscoring its potential for clinical applications.

Via

Access Paper or Ask Questions

Building Blocks for a Complex-Valued Transformer Architecture

Jun 16, 2023

Florian Eilers, Xiaoyi Jiang

Abstract:Most deep learning pipelines are built on real-valued operations to deal with real-valued inputs such as images, speech or music signals. However, a lot of applications naturally make use of complex-valued signals or images, such as MRI or remote sensing. Additionally the Fourier transform of signals is complex-valued and has numerous applications. We aim to make deep learning directly applicable to these complex-valued signals without using projections into $\mathbb{R}^2$. Thus we add to the recent developments of complex-valued neural networks by presenting building blocks to transfer the transformer architecture to the complex domain. We present multiple versions of a complex-valued Scaled Dot-Product Attention mechanism as well as a complex-valued layer normalization. We test on a classification and a sequence generation task on the MusicNet dataset and show improved robustness to overfitting while maintaining on-par performance when compared to the real-valued transformer architecture.

* ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023

Via

Access Paper or Ask Questions

TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Oct 19, 2022

Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, David Sontag

Figure 1 for TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Figure 2 for TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Figure 3 for TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Figure 4 for TabLLM: Few-shot Classification of Tabular Data with Large Language Models

Abstract:We study the application of large language models to zero-shot and few-shot classification of tabular data. We prompt the large language model with a serialization of the tabular data to a natural-language string, together with a short description of the classification problem. In the few-shot setting, we fine-tune the large language model using some labeled examples. We evaluate several serialization methods including templates, table-to-text models, and large language models. Despite its simplicity, we find that this technique outperforms prior deep-learning-based tabular classification methods on several benchmark datasets. In most cases, even zero-shot classification obtains non-trivial performance, illustrating the method's ability to exploit prior knowledge encoded in large language models. Unlike many deep learning methods for tabular datasets, this approach is also competitive with strong traditional baselines like gradient-boosted trees, especially in the very-few-shot setting.

Via

Access Paper or Ask Questions

Kernel-Based Generalized Median Computation for Consensus Learning

Sep 21, 2022

Andreas Nienkötter, Xiaoyi Jiang

Figure 1 for Kernel-Based Generalized Median Computation for Consensus Learning

Figure 2 for Kernel-Based Generalized Median Computation for Consensus Learning

Figure 3 for Kernel-Based Generalized Median Computation for Consensus Learning

Figure 4 for Kernel-Based Generalized Median Computation for Consensus Learning

Abstract:Computing a consensus object from a set of given objects is a core problem in machine learning and pattern recognition. One popular approach is to formulate it as an optimization problem using the generalized median. Previous methods like the Prototype and Distance-Preserving Embedding methods transform objects into a vector space, solve the generalized median problem in this space, and inversely transform back into the original space. Both of these methods have been successfully applied to a wide range of object domains, where the generalized median problem has inherent high computational complexity (typically $\mathcal{NP}$-hard) and therefore approximate solutions are required. Previously, explicit embedding methods were used in the computation, which often do not reflect the spatial relationship between objects exactly. In this work we introduce a kernel-based generalized median framework that is applicable to both positive definite and indefinite kernels. This framework computes the relationship between objects and its generalized median in kernel space, without the need of an explicit embedding. We show that the spatial relationship between objects is more accurately represented in kernel space than in an explicit vector space using easy-to-compute kernels, and demonstrate superior performance of generalized median computation on datasets of three different domains. A software toolbox resulting from our work is made publicly available to encourage other researchers to explore the generalized median computation and applications.

* Early Access by TPAMI 2022 (https://ieeexplore.ieee.org/document/9869722)
* 17 pages, 5 figures, 7 tables

Via

Access Paper or Ask Questions