Abstract:Generic object detection is a category-independent task that relies on accurate modeling of objectness. Most relevant CNN-based models of objectness utilize loss functions (e.g., binary cross entropy) that focus on the single-response, i.e., the loss response of a single pixel. Inspired by the human visual system, which first discerns the boundaries of ambiguous regions (i.e., hard regions) before delving into the semantic meaning, we propose a novel loss function, Spatial Coherence Loss (SCLoss), that uses the mutual response between adjacent pixels to suppress or emphasize the single-response of pixels. We demonstrate that the proposed SCLoss can gradually learn the hard regions by detecting and emphasizing their boundaries. Through comprehensive experiments, we demonstrate that replacing popular loss functions with SCLoss can improve the performance of current state-of-the-art (SOTA) salient or camouflaged object detection (SOD or COD) models. We also demonstrate that combining SCLoss with other loss functions can further improve performance and result in the SOTA outcomes for different applications. Finally, as a demonstrative example of the potential uses for other related tasks, we show an application of SCLoss for semantic segmentation.
Abstract:Anatomical consistency in biomarker segmentation is crucial for many medical image analysis tasks. A promising paradigm for achieving anatomically consistent segmentation via deep networks is incorporating pixel connectivity, a basic concept in digital topology, to model inter-pixel relationships. However, previous works on connectivity modeling have ignored the rich channel-wise directional information in the latent space. In this work, we demonstrate that effective disentanglement of directional sub-space from the shared latent space can significantly enhance the feature representation in the connectivity-based network. To this end, we propose a directional connectivity modeling scheme for segmentation that decouples, tracks, and utilizes the directional information across the network. Experiments on various public medical image segmentation benchmarks show the effectiveness of our model as compared to the state-of-the-art methods. Code is available at https://github.com/Zyun-Y/DconnNet.
Abstract:Optical coherence tomography (OCT) helps ophthalmologists assess macular edema, accumulation of fluids, and lesions at microscopic resolution. Quantification of retinal fluids is necessary for OCT-guided treatment management, which relies on a precise image segmentation step. As manual analysis of retinal fluids is a time-consuming, subjective, and error-prone task, there is increasing demand for fast and robust automatic solutions. In this study, a new convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation. The model benefits from hierarchical representation learning of textural, contextual, and edge features using a new self-adaptive dual-attention (SDA) module, multiple self-adaptive attention-based skip connections (SASC), and a novel multi-scale deep self supervision learning (DSL) scheme. The attention mechanism in the proposed SDA module enables the model to automatically extract deformation-aware representations at different levels, and the introduced SASC paths further consider spatial-channel interdependencies for concatenation of counterpart encoder and decoder units, which improve representational capability. RetiFluidNet is also optimized using a joint loss function comprising a weighted version of dice overlap and edge-preserved connectivity-based losses, where several hierarchical stages of multi-scale local losses are integrated into the optimization process. The model is validated based on three publicly available datasets: RETOUCH, OPTIMA, and DUKE, with comparisons against several baselines. Experimental results on the datasets prove the effectiveness of the proposed model in retinal OCT fluid segmentation and reveal that the suggested method is more effective than existing state-of-the-art fluid segmentation algorithms in adapting to retinal OCT scans recorded by various image scanning instruments.
Abstract:Optical coherence tomography (OCT) has seen widespread success as an in vivo clinical diagnostic 3D imaging modality, impacting areas including ophthalmology, cardiology, and gastroenterology. Despite its many advantages, such as high sensitivity, speed, and depth penetration, OCT suffers from several shortcomings that ultimately limit its utility as a 3D microscopy tool, such as its pervasive coherent speckle noise and poor lateral resolution required to maintain millimeter-scale imaging depths. Here, we present 3D optical coherence refraction tomography (OCRT), a computational extension of OCT which synthesizes an incoherent contrast mechanism by combining multiple OCT volumes, acquired across two rotation axes, to form a resolution-enhanced, speckle-reduced, refraction-corrected 3D reconstruction. Our label-free computational 3D microscope features a novel optical design incorporating a parabolic mirror to enable the capture of 5D plenoptic datasets, consisting of millimetric 3D fields of view over up to $\pm75^\circ$ without moving the sample. We demonstrate that 3D OCRT reveals 3D features unobserved by conventional OCT in fruit fly, zebrafish, and mouse samples.
Abstract:Salient object detection (SOD) is viewed as a pixel-wise saliency modeling task by traditional deep learning-based methods. Although great progress has been made, a challenge of modern SOD models is the insufficient utilization of inter-pixel information, which usually results in imperfect segmentations near the edge regions. As we demonstrate, using a saliency map as the network output is a sub-optimal choice. To address this problem, we propose a connectivity-based approach named bilateral connectivity network (BiconNet), which uses a connectivity map instead of a saliency map as the network output for effective modeling of inter-pixel relationships and object saliency. Moreover, we propose a bilateral voting module to enhance the output connectivity map and a novel edge feature enhancement method that efficiently utilizes edge-specific features with negligible parameter increase. We show that our model can use any existing saliency-based SOD framework as its backbone. Through comprehensive experiments on five benchmark datasets, we demonstrate that our proposed method outperforms state-of-the-art SOD approaches.
Abstract:We propose a new method for modeling the distribution function of high dimensional extreme value distributions. The Pickands dependence function models the relationship between the covariates in the tails, and we learn this function using a neural network that is designed to satisfy its required properties. Moreover, we present new methods for recovering the spectral representation of extreme distributions and propose a generative model for sampling from extreme copulas. Numerical examples are provided demonstrating the efficacy and promise of our proposed methods.
Abstract:We present a feature-free photogrammetric technique that enables quantitative 3D mesoscopic (mm-scale height variation) imaging with tens-of-micron accuracy from sequences of images acquired by a smartphone at close range (several cm) under freehand motion without additional hardware. Our end-to-end, pixel-intensity-based approach jointly registers and stitches all the images by estimating a coaligned height map, which acts as a pixel-wise radial deformation field that orthorectifies each camera image to allow homographic registration. The height maps themselves are reparameterized as the output of an untrained encoder-decoder convolutional neural network (CNN) with the raw camera images as the input, which effectively removes many reconstruction artifacts. Our method also jointly estimates both the camera's dynamic 6D pose and its distortion using a nonparametric model, the latter of which is especially important in mesoscopic applications when using cameras not designed for imaging at short working distances, such as smartphone cameras. We also propose strategies for reducing computation time and memory, applicable to other multi-frame registration problems. Finally, we demonstrate our method using sequences of multi-megapixel images captured by an unstabilized smartphone on a variety of samples (e.g., painting brushstrokes, circuit board, seeds).
Abstract:It has been conjectured that the Fisher divergence is more robust to model uncertainty than the conventional Kullback-Leibler (KL) divergence. This motivates the design of a new class of robust generative auto-encoders (AE) referred to as Fisher auto-encoders. Our approach is to design Fisher AEs by minimizing the Fisher divergence between the intractable joint distribution of observed data and latent variables, with that of the postulated/modeled joint distribution. In contrast to KL-based variational AEs (VAEs), the Fisher AE can exactly quantify the distance between the true and the model-based posterior distributions. Qualitative and quantitative results are provided on both MNIST and celebA datasets demonstrating the competitive performance of Fisher AEs in terms of robustness compared to other AEs such as VAEs and Wasserstein AEs.
Abstract:We present a method for learning latent stochastic differential equations (SDEs) from high dimensional time series data. Given a time series generated from a lower dimensional It\^{o} process, the proposed method uncovers the relevant parameters of the SDE through a self-supervised learning approach. Using the framework of variational autoencoders (VAEs), we consider a conditional generative model for the data based on the Euler-Maruyama approximation of SDE solutions. Furthermore, we use recent results on identifiability of semi-supervised learning to show that our model can recover not only the underlying SDE parameters, but also the original latent space, up to an isometry, in the limit of infinite data. We validate the model through a series of different simulated video processing tasks where the underlying SDE is known. Our results suggest that the proposed method effectively learns the underlying SDE, as predicted by the theory.
Abstract:We develop a framework for estimating unknown partial differential equations from noisy data, using a deep learning approach. Given noisy samples of a solution to an unknown PDE, our method interpolates the samples using a neural network, and extracts the PDE by equating derivatives of the neural network approximation. Our method applies to PDEs which are linear combinations of user-defined dictionary functions, and generalizes previous methods that only consider parabolic PDEs. We introduce a regularization scheme that prevents the function approximation from overfitting the data and forces it to be a solution of the underlying PDE. We validate the model on simulated data generated by the known PDEs and added Gaussian noise, and we study our method under different levels of noise. We also compare the error of our method with a Cramer-Rao lower bound for an ordinary differential equation. Our results indicate that our method outperforms other methods in estimating PDEs, especially in the low signal-to-noise regime.