Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sungho Jo

MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Dec 26, 2024

Byeonggwon Lee, Junkyu Park, Khang Truong Giang, Sungho Jo, Soohwan Song

Figure 1 for MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Figure 2 for MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Figure 3 for MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Figure 4 for MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo

Abstract:This study addresses the challenge of online 3D model generation for neural rendering using an RGB image stream. Previous research has tackled this issue by incorporating Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS) as scene representations within dense SLAM methods. However, most studies focus primarily on estimating coarse 3D scenes rather than achieving detailed reconstructions. Moreover, depth estimation based solely on images is often ambiguous, resulting in low-quality 3D models that lead to inaccurate renderings. To overcome these limitations, we propose a novel framework for high-quality 3DGS modeling that leverages an online multi-view stereo (MVS) approach. Our method estimates MVS depth using sequential frames from a local time window and applies comprehensive depth refinement techniques to filter out outliers, enabling accurate initialization of Gaussians in 3DGS. Furthermore, we introduce a parallelized backend module that optimizes the 3DGS model efficiently, ensuring timely updates with each new keyframe. Experimental results demonstrate that our method outperforms state-of-the-art dense SLAM methods, particularly excelling in challenging outdoor environments.

* 7 pages, 6 figures, submitted to IEEE ICRA 2025

Via

Access Paper or Ask Questions

Learning to Produce Semi-dense Correspondences for Visual Localization

Feb 13, 2024

Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract:This study addresses the challenge of performing visual localization in demanding conditions such as night-time scenarios, adverse weather, and seasonal changes. While many prior studies have focused on improving image-matching performance to facilitate reliable dense keypoint matching between images, existing methods often heavily rely on predefined feature points on a reconstructed 3D model. Consequently, they tend to overlook unobserved keypoints during the matching process. Therefore, dense keypoint matches are not fully exploited, leading to a notable reduction in accuracy, particularly in noisy scenes. To tackle this issue, we propose a novel localization method that extracts reliable semi-dense 2D-3D matching points based on dense keypoint matches. This approach involves regressing semi-dense 2D keypoints into 3D scene coordinates using a point inference network. The network utilizes both geometric and visual cues to effectively infer 3D coordinates for unobserved keypoints from the observed ones. The abundance of matching information significantly enhances the accuracy of camera pose estimation, even in scenarios involving noisy or sparse 3D models. Comprehensive evaluations demonstrate that the proposed method outperforms other methods in challenging scenes and achieves competitive results in large-scale visual localization benchmarks. The code will be available.

* 17 pages, 9 figures, 6 tables

Via

Access Paper or Ask Questions

TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching

Jul 02, 2023

Khang Truong Giang, Soohwan Song, Sungho Jo

Abstract:This study tackles the challenge of image matching in difficult scenarios, such as scenes with significant variations or limited texture, with a strong emphasis on computational efficiency. Previous studies have attempted to address this challenge by encoding global scene contexts using Transformers. However, these approaches suffer from high computational costs and may not capture sufficient high-level contextual information, such as structural shapes or semantic instances. Consequently, the encoded features may lack discriminative power in challenging scenes. To overcome these limitations, we propose a novel image-matching method that leverages a topic-modeling strategy to capture high-level contexts in images. Our method represents each image as a multinomial distribution over topics, where each topic represents a latent semantic instance. By incorporating these topics, we can effectively capture comprehensive context information and obtain discriminative and high-quality features. Additionally, our method effectively matches features within corresponding semantic regions by estimating the covisible topics. To enhance the efficiency of feature matching, we have designed a network with a pooling-and-merging attention module. This module reduces computation by employing attention only on fixed-sized topics and small-sized features. Through extensive experiments, we have demonstrated the superiority of our method in challenging scenarios. Specifically, our method significantly reduces computational costs while maintaining higher image-matching accuracy compared to state-of-the-art methods. The code will be updated soon at https://github.com/TruongKhang/TopicFM

* Paper extension of TopicFM (arXiv:2207.00328)

Via

Access Paper or Ask Questions

TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Jul 01, 2022

Khang Truong Giang, Soohwan Song, Sungho Jo

Figure 1 for TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Figure 2 for TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Figure 3 for TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Figure 4 for TopicFM: Robust and Interpretable Feature Matching with Topic-assisted

Abstract:Finding correspondences across images is an important task in many visual applications. Recent state-of-the-art methods focus on end-to-end learning-based architectures designed in a coarse-to-fine manner. They use a very deep CNN or multi-block Transformer to learn robust representation, which requires high computation power. Moreover, these methods learn features without reasoning about objects, shapes inside images, thus lacks of interpretability. In this paper, we propose an architecture for image matching which is efficient, robust, and interpretable. More specifically, we introduce a novel feature matching module called TopicFM which can roughly organize same spatial structure across images into a topic and then augment the features inside each topic for accurate matching. To infer topics, we first learn global embedding of topics and then use a latent-variable model to detect-then-assign the image structures into topics. Our method can only perform matching in co-visibility regions to reduce computations. Extensive experiments in both outdoor and indoor datasets show that our method outperforms the recent methods in terms of matching performance and computational efficiency. The code is available at https://github.com/TruongKhang/TopicFM.

* in preparation

Via

Access Paper or Ask Questions

Curvature-guided dynamic scale networks for Multi-view Stereo

Dec 11, 2021

Khang Truong Giang, Soohwan Song, Sungho Jo

Figure 1 for Curvature-guided dynamic scale networks for Multi-view Stereo

Figure 2 for Curvature-guided dynamic scale networks for Multi-view Stereo

Figure 3 for Curvature-guided dynamic scale networks for Multi-view Stereo

Figure 4 for Curvature-guided dynamic scale networks for Multi-view Stereo

Abstract:Multi-view stereo (MVS) is a crucial task for precise 3D reconstruction. Most recent studies tried to improve the performance of matching cost volume in MVS by designing aggregated 3D cost volumes and their regularization. This paper focuses on learning a robust feature extraction network to enhance the performance of matching costs without heavy computation in the other steps. In particular, we present a dynamic scale feature extraction network, namely, CDSFNet. It is composed of multiple novel convolution layers, each of which can select a proper patch scale for each pixel guided by the normal curvature of the image surface. As a result, CDFSNet can estimate the optimal patch scales to learn discriminative features for accurate matching computation between reference and source images. By combining the robust extracted features with an appropriate cost formulation strategy, our resulting MVS architecture can estimate depth maps more precisely. Extensive experiments showed that the proposed method outperforms other state-of-the-art methods on complex outdoor scenes. It significantly improves the completeness of reconstructed models. As a result, the method can process higher resolution inputs within faster run-time and lower memory than other MVS methods. Our source code is available at url{https://github.com/TruongKhang/cds-mvsnet}.

Via

Access Paper or Ask Questions

Improved explanatory efficacy on human affect and workload through interactive process in artificial intelligence

Dec 13, 2019

Byung Hyung Kim, Seunghun Koh, Sejoon Huh, Sungho Jo

Figure 1 for Improved explanatory efficacy on human affect and workload through interactive process in artificial intelligence

Figure 2 for Improved explanatory efficacy on human affect and workload through interactive process in artificial intelligence

Figure 3 for Improved explanatory efficacy on human affect and workload through interactive process in artificial intelligence

Figure 4 for Improved explanatory efficacy on human affect and workload through interactive process in artificial intelligence

Abstract:Despite recent advances in the field of explainable artificial intelligence systems, a concrete quantitative measure for evaluating the usability of such systems is nonexistent. Ensuring the success of an explanatory interface in interacting with users requires a cyclic, symbiotic relationship between human and artificial intelligence. We, therefore, propose explanatory efficacy, a novel metric for evaluating the strength of the cyclic relationship the interface exhibits. Furthermore, in a user study, we evaluated the perceived affect and workload and recorded the EEG signals of our participants as they interacted with our custom-built, iterative explanatory interface to build personalized recommendation systems. We found that systems for perceptually driven iterative tasks with greater explanatory efficacy are characterized by statistically significant hemispheric differences in neural signals, indicating the feasibility of neural correlates as a measure of explanatory efficacy. These findings are beneficial for researchers who aim to study the circular ecosystem of the human-artificial intelligence partnership.

Via

Access Paper or Ask Questions

Wearable Affective Life-Log System for Understanding Emotion Dynamics in Daily Life

Nov 05, 2019

Byung Hyung Kim, Sungho Jo

Figure 1 for Wearable Affective Life-Log System for Understanding Emotion Dynamics in Daily Life

Figure 2 for Wearable Affective Life-Log System for Understanding Emotion Dynamics in Daily Life

Figure 3 for Wearable Affective Life-Log System for Understanding Emotion Dynamics in Daily Life

Figure 4 for Wearable Affective Life-Log System for Understanding Emotion Dynamics in Daily Life

Abstract:Past research on recognizing human affect has made use of a variety of physiological sensors in many ways. Nonetheless, how affective dynamics are influenced in the context of human daily life has not yet been explored. In this work, we present a wearable affective life-log system (ALIS), that is robust as well as easy to use in daily life to detect emotional changes and determine their cause-and-effect relationship on users' lives. The proposed system records how a user feels in certain situations during long-term activities with physiological sensors. Based on the long-term monitoring, the system analyzes how the contexts of the user's life affect his/her emotion changes. Furthermore, real-world experimental results demonstrate that the proposed wearable life-log system enables us to build causal structures to find effective stress relievers suited to every stressful situation in school life.

Via

Access Paper or Ask Questions

An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Nov 05, 2019

Byung Hyung Kim, Sungho Jo

Figure 1 for An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Figure 2 for An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Figure 3 for An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Figure 4 for An Affective Situation Labeling System from Psychological Behaviors in Emotion Recognition

Abstract:This paper presents a computational framework for providing affective labels to real-life situations, called A-Situ. We first define an affective situation, as a specific arrangement of affective entities relevant to emotion elicitation in a situation. Then, the affective situation is represented as a set of labels in the valence-arousal emotion space. Based on physiological behaviors in response to a situation, the proposed framework quantifies the expected emotion evoked by the interaction with a stimulus event. The accumulated result in a spatiotemporal situation is represented as a polynomial curve called the affective curve, which bridges the semantic gap between cognitive and affective perception in real-world situations. We show the efficacy of the curve for reliable emotion labeling in real-world experiments, respectively concerning 1) a comparison between the results from our system and existing explicit assessments for measuring emotion, 2) physiological distinctiveness in emotional states, and 3) physiological characteristics correlated to continuous labels. The efficiency of affective curves to discriminate emotional states is evaluated through subject-dependent classification performance using bicoherence features to represent discrete affective states in the valence-arousal space. Furthermore, electroencephalography-based statistical analysis revealed the physiological correlates of the affective curves.

Via

Access Paper or Ask Questions