Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francois Rameau

Exploring Patient Data Requirements in Training Effective AI Models for MRI-based Breast Cancer Classification

Feb 22, 2025

Solha Kang, Wesley De Neve, Francois Rameau, Utku Ozbulak

Abstract:The past decade has witnessed a substantial increase in the number of startups and companies offering AI-based solutions for clinical decision support in medical institutions. However, the critical nature of medical decision-making raises several concerns about relying on external software. Key issues include potential variations in image modalities and the medical devices used to obtain these images, potential legal issues, and adversarial attacks. Fortunately, the open-source nature of machine learning research has made foundation models publicly available and straightforward to use for medical applications. This accessibility allows medical institutions to train their own AI-based models, thereby mitigating the aforementioned concerns. Given this context, an important question arises: how much data do medical institutions need to train effective AI models? In this study, we explore this question in relation to breast cancer detection, a particularly contested area due to the prevalence of this disease, which affects approximately 1 in every 8 women. Through large-scale experiments on various patient sizes in the training set, we show that medical institutions do not need a decade's worth of MRI images to train an AI model that performs competitively with the state-of-the-art, provided the model leverages foundation models. Furthermore, we observe that for patient counts greater than 50, the number of patients in the training set has a negligible impact on the performance of models and that simple ensembles further improve the results without additional complexity.

* Accepted for publication in MICCAI 2024 Deep Breast Workshop on AI and Imaging for Diagnostic and Treatment Challenges in Breast Care

Via

Access Paper or Ask Questions

360 in the Wild: Dataset for Depth Prediction and View Synthesis

Jun 27, 2024

Kibaek Park, Francois Rameau, Jaesik Park, In So Kweon

Figure 1 for 360 in the Wild: Dataset for Depth Prediction and View Synthesis

Figure 2 for 360 in the Wild: Dataset for Depth Prediction and View Synthesis

Figure 3 for 360 in the Wild: Dataset for Depth Prediction and View Synthesis

Figure 4 for 360 in the Wild: Dataset for Depth Prediction and View Synthesis

Abstract:The large abundance of perspective camera datasets facilitated the emergence of novel learning-based strategies for various tasks, such as camera localization, single image depth estimation, or view synthesis. However, panoramic or omnidirectional image datasets, including essential information, such as pose and depth, are mostly made with synthetic scenes. In this work, we introduce a large scale 360$^{\circ}$ videos dataset in the wild. This dataset has been carefully scraped from the Internet and has been captured from various locations worldwide. Hence, this dataset exhibits very diversified environments (e.g., indoor and outdoor) and contexts (e.g., with and without moving objects). Each of the 25K images constituting our dataset is provided with its respective camera's pose and depth map. We illustrate the relevance of our dataset for two main tasks, namely, single image depth estimation and view synthesis.

Via

Access Paper or Ask Questions

In Defense of Pure 16-bit Floating-Point Neural Networks

May 18, 2023

Juyoung Yun, Byungkon Kang, Francois Rameau, Zhoulai Fu

Abstract:Reducing the number of bits needed to encode the weights and activations of neural networks is highly desirable as it speeds up their training and inference time while reducing memory consumption. For these reasons, research in this area has attracted significant attention toward developing neural networks that leverage lower-precision computing, such as mixed-precision training. Interestingly, none of the existing approaches has investigated pure 16-bit floating-point settings. In this paper, we shed light on the overlooked efficiency of pure 16-bit floating-point neural networks. As such, we provide a comprehensive theoretical analysis to investigate the factors contributing to the differences observed between 16-bit and 32-bit models. We formalize the concepts of floating-point error and tolerance, enabling us to quantitatively explain the conditions under which a 16-bit model can closely approximate the results of its 32-bit counterpart. This theoretical exploration offers perspective that is distinct from the literature which attributes the success of low-precision neural networks to its regularization effect. This in-depth analysis is supported by an extensive series of experiments. Our findings demonstrate that pure 16-bit floating-point neural networks can achieve similar or even better performance than their mixed-precision and 32-bit counterparts. We believe the results presented in this paper will have significant implications for machine learning practitioners, offering an opportunity to reconsider using pure 16-bit networks in various applications.

Via

Access Paper or Ask Questions

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

May 10, 2023

Chenghao Li, Chaoning Zhang, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, Choong Seon Hong

Figure 1 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 2 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 3 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 4 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Abstract:Generative AI (AIGC, a.k.a. AI generated content) has made remarkable progress in the past few years, among which text-guided content generation is the most practical one since it enables the interaction between human instruction and AIGC. Due to the development in text-to-image as well 3D modeling technologies (like NeRF), text-to-3D has become a newly emerging yet highly active research field. Our work conducts the first yet comprehensive survey on text-to-3D to help readers interested in this direction quickly catch up with its fast development. First, we introduce 3D data representations, including both Euclidean data and non-Euclidean data. On top of that, we introduce various foundation technologies as well as summarize how recent works combine those foundation technologies to realize satisfactory text-to-3D. Moreover, we summarize how text-to-3D technology is used in various applications, including avatar generation, texture generation, shape transformation, and scene generation.

Via

Access Paper or Ask Questions

InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

Jan 10, 2023

Juyeb Shin, Francois Rameau, Hyeonjun Jeong, Dongsuk Kum

Figure 1 for InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

Figure 2 for InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

Figure 3 for InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

Figure 4 for InstaGraM: Instance-level Graph Modeling for Vectorized HD Map Learning

Abstract:The construction of lightweight High-definition (HD) maps containing geometric and semantic information is of foremost importance for the large-scale deployment of autonomous driving. To automatically generate such type of map from a set of images captured by a vehicle, most works formulate this mapping as a segmentation problem, which implies heavy post-processing to obtain the final vectorized representation. Alternative techniques have the ability to generate an HD map in an end-to-end manner but rely on computationally expensive auto-regressive models. To bring camera-based to an applicable level, we propose InstaGraM, a fast end-to-end network generating a vectorized HD map via instance-level graph modeling of the map elements. Our strategy consists of three main stages: top-view feature extraction, road elements' vertices and edges detection, and conversion to a semantic vector representation. After top-down feature extraction, an encoder-decoder architecture is utilized to predict a set of vertices and edge maps of the road elements. Finally, these vertices along with edge maps are associated through an attentional graph neural network generating a semantic vectorized map. Instead of relying on a common segmentation approach, we propose to regress distance transform maps as they provide strong spatial relations and directional information between vertices. Comprehensive experiments on nuScenes dataset show that our proposed network outperforms HDMapNet by 13.7 mAP and achieves comparable accuracy with VectorMapNet 5x faster inference speed.

Via

Access Paper or Ask Questions

Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Jun 04, 2022

Fei Pan, Francois Rameau, Junsik Kim, In So Kweon

Figure 1 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 2 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 3 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Figure 4 for Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point Supervision via Active Selection

Abstract:Training models dedicated to semantic segmentation requires a large amount of pixel-wise annotated data. Due to their costly nature, these annotations might not be available for the task at hand. To alleviate this problem, unsupervised domain adaptation approaches aim at aligning the feature distributions between the labeled source and the unlabeled target data. While these strategies lead to noticeable improvements, their effectiveness remains limited. To guide the domain adaptation task more efficiently, previous works attempted to include human interactions in this process under the form of sparse single-pixel annotations in the target data. In this work, we propose a new domain adaptation framework for semantic segmentation with annotated points via active selection. First, we conduct an unsupervised domain adaptation of the model; from this adaptation, we use an entropy-based uncertainty measurement for target points selection. Finally, to minimize the domain gap, we propose a domain adaptation framework utilizing these target points annotated by human annotators. Experimental results on benchmark datasets show the effectiveness of our methods against existing unsupervised domain adaptation approaches. The propose pipeline is generic and can be included as an extra module to existing domain adaptation strategies.

Via

Access Paper or Ask Questions

Keypoints Tracking via Transformer Networks

Mar 24, 2022

Oleksii Nasypanyi, Francois Rameau

Figure 1 for Keypoints Tracking via Transformer Networks

Figure 2 for Keypoints Tracking via Transformer Networks

Figure 3 for Keypoints Tracking via Transformer Networks

Figure 4 for Keypoints Tracking via Transformer Networks

Abstract:In this thesis, we propose a pioneering work on sparse keypoints tracking across images using transformer networks. While deep learning-based keypoints matching have been widely investigated using graph neural networks - and more recently transformer networks, they remain relatively too slow to operate in real-time and are particularly sensitive to the poor repeatability of the keypoints detectors. In order to address these shortcomings, we propose to study the particular case of real-time and robust keypoints tracking. Specifically, we propose a novel architecture which ensures a fast and robust estimation of the keypoints tracking between successive images of a video sequence. Our method takes advantage of a recent breakthrough in computer vision, namely, visual transformer networks. Our method consists of two successive stages, a coarse matching followed by a fine localization of the keypoints' correspondences prediction. Through various experiments, we demonstrate that our approach achieves competitive results and demonstrates high robustness against adverse conditions, such as illumination change, occlusion and viewpoint differences.

Via

Access Paper or Ask Questions

PointMixer: MLP-Mixer for Point Cloud Understanding

Nov 27, 2021

Jaesung Choe, Chunghyun Park, Francois Rameau, Jaesik Park, In So Kweon

Figure 1 for PointMixer: MLP-Mixer for Point Cloud Understanding

Figure 2 for PointMixer: MLP-Mixer for Point Cloud Understanding

Figure 3 for PointMixer: MLP-Mixer for Point Cloud Understanding

Figure 4 for PointMixer: MLP-Mixer for Point Cloud Understanding

Abstract:MLP-Mixer has newly appeared as a new challenger against the realm of CNNs and transformer. Despite its simplicity compared to transformer, the concept of channel-mixing MLPs and token-mixing MLPs achieves noticeable performance in visual recognition tasks. Unlike images, point clouds are inherently sparse, unordered and irregular, which limits the direct use of MLP-Mixer for point cloud understanding. In this paper, we propose PointMixer, a universal point set operator that facilitates information sharing among unstructured 3D points. By simply replacing token-mixing MLPs with a softmax function, PointMixer can "mix" features within/between point sets. By doing so, PointMixer can be broadly used in the network as inter-set mixing, intra-set mixing, and pyramid mixing. Extensive experiments show the competitive or superior performance of PointMixer in semantic segmentation, classification, and point reconstruction against transformer-based methods.

Via

Access Paper or Ask Questions

Deep Point Cloud Reconstruction

Nov 23, 2021

Jaesung Choe, Byeongin Joung, Francois Rameau, Jaesik Park, In So Kweon

Figure 1 for Deep Point Cloud Reconstruction

Figure 2 for Deep Point Cloud Reconstruction

Figure 3 for Deep Point Cloud Reconstruction

Figure 4 for Deep Point Cloud Reconstruction

Abstract:Point cloud obtained from 3D scanning is often sparse, noisy, and irregular. To cope with these issues, recent studies have been separately conducted to densify, denoise, and complete inaccurate point cloud. In this paper, we advocate that jointly solving these tasks leads to significant improvement for point cloud reconstruction. To this end, we propose a deep point cloud reconstruction network consisting of two stages: 1) a 3D sparse stacked-hourglass network as for the initial densification and denoising, 2) a refinement via transformers converting the discrete voxels into 3D points. In particular, we further improve the performance of transformer by a newly proposed module called amplified positional encoding. This module has been designed to differently amplify the magnitude of positional encoding vectors based on the points' distances for adaptive refinements. Extensive experiments demonstrate that our network achieves state-of-the-art performance among the recent studies in the ScanNet, ICL-NUIM, and ShapeNetPart datasets. Moreover, we underline the ability of our network to generalize toward real-world and unmet scenes.

Via

Access Paper or Ask Questions

Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Oct 13, 2021

Seokju Lee, Francois Rameau, Fei Pan, In So Kweon

Figure 1 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 2 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 3 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Figure 4 for Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation

Abstract:Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task that often relies on the so-called scene rigidity assumption. When observing a dynamic environment, this assumption is violated which leads to an ambiguity between the ego-motion of the camera and the motion of the objects. To solve this problem, we present a self-supervised learning framework for 3D object motion field estimation from monocular videos. Our contributions are two-fold. First, we propose a two-stage projection pipeline to explicitly disentangle the camera ego-motion and the object motions with dynamics attention module, called DAM. Specifically, we design an integrated motion model that estimates the motion of the camera and object in the first and second warping stages, respectively, controlled by the attention module through a shared motion encoder. Second, we propose an object motion field estimation through contrastive sample consensus, called CSAC, taking advantage of weak semantic prior (bounding box from an object detector) and geometric constraints (each object respects the rigid body motion model). Experiments on KITTI, Cityscapes, and Waymo Open Dataset demonstrate the relevance of our approach and show that our method outperforms state-of-the-art algorithms for the tasks of self-supervised monocular depth estimation, object motion segmentation, monocular scene flow estimation, and visual odometry.

* ICCV 2021

Via

Access Paper or Ask Questions