Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohsen Ghafoorian

FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Mar 22, 2024

Florian Langer, Jihong Ju, Georgi Dikov, Gerhard Reitmayr, Mohsen Ghafoorian

Figure 1 for FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Figure 2 for FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Figure 3 for FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Figure 4 for FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos

Abstract:Digitising the 3D world into a clean, CAD model-based representation has important applications for augmented reality and robotics. Current state-of-the-art methods are computationally intensive as they individually encode each detected object and optimise CAD alignments in a second stage. In this work, we propose FastCAD, a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene. In contrast to previous works, we directly predict alignment parameters and shape embeddings. We achieve high-quality shape retrievals by learning CAD embeddings in a contrastive learning framework and distilling those into FastCAD. Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans while outperforming them on the challenging Scan2CAD alignment benchmark. Further, our approach collaborates seamlessly with online 3D reconstruction techniques. This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS. Doing so, we significantly improve the Scan2CAD alignment accuracy in the video setting from 43.0% to 48.2% and the reconstruction accuracy from 22.9% to 29.6%.

Via

Access Paper or Ask Questions

InterroGate: Learning to Share, Specialize, and Prune Representations for Multi-task Learning

Feb 26, 2024

Babak Ehteshami Bejnordi, Gaurav Kumar, Amelie Royer, Christos Louizos, Tijmen Blankevoort, Mohsen Ghafoorian

Abstract:Jointly learning multiple tasks with a unified model can improve accuracy and data efficiency, but it faces the challenge of task interference, where optimizing one task objective may inadvertently compromise the performance of another. A solution to mitigate this issue is to allocate task-specific parameters, free from interference, on top of shared features. However, manually designing such architectures is cumbersome, as practitioners need to balance between the overall performance across all tasks and the higher computational cost induced by the newly added parameters. In this work, we propose \textit{InterroGate}, a novel multi-task learning (MTL) architecture designed to mitigate task interference while optimizing inference computational efficiency. We employ a learnable gating mechanism to automatically balance the shared and task-specific representations while preserving the performance of all tasks. Crucially, the patterns of parameter sharing and specialization dynamically learned during training, become fixed at inference, resulting in a static, optimized MTL architecture. Through extensive empirical evaluations, we demonstrate SoTA results on three MTL benchmarks using convolutional as well as transformer-based backbones on CelebA, NYUD-v2, and PASCAL-Context.

* Under review

Via

Access Paper or Ask Questions

PVP: Personalized Video Prior for Editable Dynamic Portraits using StyleGAN

Jun 29, 2023

Kai-En Lin, Alex Trevithick, Keli Cheng, Michel Sarkis, Mohsen Ghafoorian, Ning Bi, Gerhard Reitmayr, Ravi Ramamoorthi

Abstract:Portrait synthesis creates realistic digital avatars which enable users to interact with others in a compelling way. Recent advances in StyleGAN and its extensions have shown promising results in synthesizing photorealistic and accurate reconstruction of human faces. However, previous methods often focus on frontal face synthesis and most methods are not able to handle large head rotations due to the training data distribution of StyleGAN. In this work, our goal is to take as input a monocular video of a face, and create an editable dynamic portrait able to handle extreme head poses. The user can create novel viewpoints, edit the appearance, and animate the face. Our method utilizes pivotal tuning inversion (PTI) to learn a personalized video prior from a monocular video sequence. Then we can input pose and expression coefficients to MLPs and manipulate the latent vectors to synthesize different viewpoints and expressions of the subject. We also propose novel loss functions to further disentangle pose and expression in the latent space. Our algorithm shows much better performance over previous approaches on monocular video datasets, and it is also capable of running in real-time at 54 FPS on an RTX 3080.

* Project website: https://cseweb.ucsd.edu//~viscomp/projects/EGSR23PVP/

Via

Access Paper or Ask Questions

Multi-Task Edge Prediction in Temporally-Dynamic Video Graphs

Dec 06, 2022

Osman Ülger, Julian Wiederer, Mohsen Ghafoorian, Vasileios Belagiannis, Pascal Mettes

Abstract:Graph neural networks have shown to learn effective node representations, enabling node-, link-, and graph-level inference. Conventional graph networks assume static relations between nodes, while relations between entities in a video often evolve over time, with nodes entering and exiting dynamically. In such temporally-dynamic graphs, a core problem is inferring the future state of spatio-temporal edges, which can constitute multiple types of relations. To address this problem, we propose MTD-GNN, a graph network for predicting temporally-dynamic edges for multiple types of relations. We propose a factorized spatio-temporal graph attention layer to learn dynamic node representations and present a multi-task edge prediction loss that models multiple relations simultaneously. The proposed architecture operates on top of scene graphs that we obtain from videos through object detection and spatio-temporal linking. Experimental evaluations on ActionGenome and CLEVRER show that modeling multiple relations in our temporally-dynamic graph network can be mutually beneficial, outperforming existing static and spatio-temporal graph neural networks, as well as state-of-the-art predicate classification methods.

* BMVC2022

Via

Access Paper or Ask Questions

Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised Semantic Segmentation

Nov 09, 2020

Erik Stammes, Tom F. H. Runia, Michael Hofmann, Mohsen Ghafoorian

Figure 1 for Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised Semantic Segmentation

Figure 2 for Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised Semantic Segmentation

Figure 3 for Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised Semantic Segmentation

Figure 4 for Find it if You Can: End-to-End Adversarial Erasing for Weakly-Supervised Semantic Segmentation

Abstract:Semantic segmentation is a task that traditionally requires a large dataset of pixel-level ground truth labels, which is time-consuming and expensive to obtain. Recent advancements in the weakly-supervised setting show that reasonable performance can be obtained by using only image-level labels. Classification is often used as a proxy task to train a deep neural network from which attention maps are extracted. However, the classification task needs only the minimum evidence to make predictions, hence it focuses on the most discriminative object regions. To overcome this problem, we propose a novel formulation of adversarial erasing of the attention maps. In contrast to previous adversarial erasing methods, we optimize two networks with opposing loss functions, which eliminates the requirement of certain suboptimal strategies; for instance, having multiple training steps that complicate the training process or a weight sharing policy between networks operating on different distributions that might be suboptimal for performance. The proposed solution does not require saliency masks, instead it uses a regularization loss to prevent the attention maps from spreading to less discriminative object regions. Our experiments on the Pascal VOC dataset demonstrate that our adversarial approach increases segmentation performance by 2.1 mIoU compared to our baseline and by 1.0 mIoU compared to previous adversarial erasing approaches.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Aug 27, 2019

Hessam Sokooti, Bob de Vos, Floris Berendsen, Mohsen Ghafoorian, Sahar Yousefi, Boudewijn P. F. Lelieveldt, Ivana Isgum, Marius Staring

Figure 1 for 3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Figure 2 for 3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Figure 3 for 3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Figure 4 for 3D Convolutional Neural Networks Image Registration Based on Efficient Supervised Learning from Artificial Deformations

Abstract:We propose a supervised nonrigid image registration method, trained using artificial displacement vector fields (DVF), for which we propose and compare three network architectures. The artificial DVFs allow training in a fully supervised and voxel-wise dense manner, but without the cost usually associated with the creation of densely labeled data. We propose a scheme to artificially generate DVFs, and for chest CT registration augment these with simulated respiratory motion. The proposed architectures are embedded in a multi-stage approach, to increase the capture range of the proposed networks in order to more accurately predict larger displacements. The proposed method, RegNet, is evaluated on multiple databases of chest CT scans and achieved a target registration error of 2.32 $\pm$ 5.33 mm and 1.86 $\pm$ 2.12 mm on SPREAD and DIR-Lab-4DCT studies, respectively. The average inference time of RegNet with two stages is about 2.2 s.

* TMI

Via

Access Paper or Ask Questions

I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation

Aug 07, 2019

Laurens Samson, Nanne van Noord, Olaf Booij, Michael Hofmann, Efstratios Gavves, Mohsen Ghafoorian

Figure 1 for I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation

Figure 2 for I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation

Figure 3 for I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation

Figure 4 for I Bet You Are Wrong: Gambling Adversarial Networks for Structured Semantic Segmentation

Abstract:Adversarial training has been recently employed for realizing structured semantic segmentation, in which the aim is to preserve higher-level scene structural consistencies in dense predictions. However, as we show, value-based discrimination between the predictions from the segmentation network and ground-truth annotations can hinder the training process from learning to improve structural qualities as well as disabling the network from properly expressing uncertainties. In this paper, we rethink adversarial training for semantic segmentation and propose to formulate the fake/real discrimination framework with a correct/incorrect training objective. More specifically, we replace the discriminator with a "gambler" network that learns to spot and distribute its budget in areas where the predictions are clearly wrong, while the segmenter network tries to leave no clear clues for the gambler where to bet. Empirical evaluation on two road-scene semantic segmentation tasks shows that not only does the proposed method re-enable expressing uncertainties, it also improves pixel-wise and structure-based metrics.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Apr 01, 2019

Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen de Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt Berseth, Mikhail Belyaev, M. Jorge Cardoso, Adrià Casamitjana(+34 more)

Figure 1 for Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Figure 2 for Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Figure 3 for Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Figure 4 for Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities and Results of the WMH Segmentation Challenge

Abstract:Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/). Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness. Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

* Accepted for publication in IEEE Transactions on Medical Imaging

Via

Access Paper or Ask Questions

Comparison of U-net-based Convolutional Neural Networks for Liver Segmentation in CT

Oct 09, 2018

Hans Meine, Grzegorz Chlebus, Mohsen Ghafoorian, Itaru Endo, Andrea Schenk

Figure 1 for Comparison of U-net-based Convolutional Neural Networks for Liver Segmentation in CT

Figure 2 for Comparison of U-net-based Convolutional Neural Networks for Liver Segmentation in CT

Figure 3 for Comparison of U-net-based Convolutional Neural Networks for Liver Segmentation in CT

Figure 4 for Comparison of U-net-based Convolutional Neural Networks for Liver Segmentation in CT

Abstract:Various approaches for liver segmentation in CT have been proposed: Besides statistical shape models, which played a major role in this research area, novel approaches on the basis of convolutional neural networks have been introduced recently. Using a set of 219 liver CT datasets with reference segmentations from liver surgery planning, we evaluate the performance of several neural network classifiers based on 2D and 3D U-net architectures. An interesting observation is that slice-wise approaches perform surprisingly well, with mean and median Dice coefficients above 0.97, and may be preferable over 3D approaches given current hardware and software limitations.

Via

Access Paper or Ask Questions

EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection

Jul 05, 2018

Mohsen Ghafoorian, Cedric Nugteren, Nóra Baka, Olaf Booij, Michael Hofmann

Figure 1 for EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection

Figure 2 for EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection

Figure 3 for EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection

Figure 4 for EL-GAN: Embedding Loss Driven Generative Adversarial Networks for Lane Detection

Abstract:Convolutional neural networks have been successfully applied to semantic segmentation problems. However, there are many problems that are inherently not pixel-wise classification problems but are nevertheless frequently formulated as semantic segmentation. This ill-posed formulation consequently necessitates hand-crafted scenario-specific and computationally expensive post-processing methods to convert the per pixel probability maps to final desired outputs. Generative adversarial networks (GANs) can be used to make the semantic segmentation network output to be more realistic or better structure-preserving, decreasing the dependency on potentially complex post-processing. In this work, we propose EL-GAN: a GAN framework to mitigate the discussed problem using an embedding loss. With EL-GAN, we discriminate based on learned embeddings of both the labels and the prediction at the same time. This results in more stable training due to having better discriminative information, benefiting from seeing both `fake' and `real' predictions at the same time. This substantially stabilizes the adversarial training process. We use the TuSimple lane marking challenge to demonstrate that with our proposed framework it is viable to overcome the inherent anomalies of posing it as a semantic segmentation problem. Not only is the output considerably more similar to the labels when compared to conventional methods, the subsequent post-processing is also simpler and crosses the competitive 96% accuracy threshold.

* 14 pages, 7 figures

Via

Access Paper or Ask Questions