Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bin Duan

X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction

Mar 11, 2025

Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan

Abstract:X-ray imaging is indispensable in medical diagnostics, yet its use is tightly regulated due to potential health risks. To mitigate radiation exposure, recent research focuses on generating novel views from sparse inputs and reconstructing Computed Tomography (CT) volumes, borrowing representations from the 3D reconstruction area. However, these representations originally target visible light imaging that emphasizes reflection and scattering effects, while neglecting penetration and attenuation properties of X-ray imaging. In this paper, we introduce X-Field, the first 3D representation specifically designed for X-ray imaging, rooted in the energy absorption rates across different materials. To accurately model diverse materials within internal structures, we employ 3D ellipsoids with distinct attenuation coefficients. To estimate each material's energy absorption of X-rays, we devise an efficient path partitioning algorithm accounting for complex ellipsoid intersections. We further propose hybrid progressive initialization to refine the geometric accuracy of X-Filed and incorporate material-based optimization to enhance model fitting along material boundaries. Experiments show that X-Field achieves superior visual fidelity on both real-world human organ and synthetic object datasets, outperforming state-of-the-art methods in X-ray Novel View Synthesis and CT Reconstruction.

* Project Page: \url{https://brack-wang.github.io/XField/}, Github Code: \url{https://github.com/Brack-Wang/X-Field}

Via

Access Paper or Ask Questions

MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Nov 03, 2024

Kaiang Wen, Bin Xie, Bin Duan, Yan Yan

Figure 1 for MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Figure 2 for MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Figure 3 for MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Figure 4 for MambaReg: Mamba-Based Disentangled Convolutional Sparse Coding for Unsupervised Deformable Multi-Modal Image Registration

Abstract:Precise alignment of multi-modal images with inherent feature discrepancies poses a pivotal challenge in deformable image registration. Traditional learning-based approaches often consider registration networks as black boxes without interpretability. One core insight is that disentangling alignment features and non-alignment features across modalities bring benefits. Meanwhile, it is challenging for the prominent methods for image registration tasks, such as convolutional neural networks, to capture long-range dependencies by their local receptive fields. The methods often fail when the given image pair has a large misalignment due to the lack of effectively learning long-range dependencies and correspondence. In this paper, we propose MambaReg, a novel Mamba-based architecture that integrates Mamba's strong capability in capturing long sequences to address these challenges. With our proposed several sub-modules, MambaReg can effectively disentangle modality-independent features responsible for registration from modality-dependent, non-aligning features. By selectively attending to the relevant features, our network adeptly captures the correlation between multi-modal images, enabling focused deformation field prediction and precise image alignment. The Mamba-based architecture seamlessly integrates the local feature extraction power of convolutional layers with the long-range dependency modeling capabilities of Mamba. Experiments on public non-rigid RGB-IR image datasets demonstrate the superiority of our method, outperforming existing approaches in terms of registration accuracy and deformation field smoothness.

Via

Access Paper or Ask Questions

Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Mar 21, 2024

Junyi Wu, Bin Duan, Weitai Kang, Hao Tang, Yan Yan

Figure 1 for Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Figure 2 for Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Figure 3 for Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Figure 4 for Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer

Abstract:While Transformers have rapidly gained popularity in various computer vision applications, post-hoc explanations of their internal mechanisms remain largely unexplored. Vision Transformers extract visual information by representing image regions as transformed tokens and integrating them via attention weights. However, existing post-hoc explanation methods merely consider these attention weights, neglecting crucial information from the transformed tokens, which fails to accurately illustrate the rationales behind the models' predictions. To incorporate the influence of token transformation into interpretation, we propose TokenTM, a novel post-hoc explanation method that utilizes our introduced measurement of token transformation effects. Specifically, we quantify token transformation effects by measuring changes in token lengths and correlations in their directions pre- and post-transformation. Moreover, we develop initialization and aggregation rules to integrate both attention weights and token transformation effects across all layers, capturing holistic token contributions throughout the model. Experimental results on segmentation and perturbation tests demonstrate the superiority of our proposed TokenTM compared to state-of-the-art Vision Transformer explanation methods.

* CVPR 2024

Via

Access Paper or Ask Questions

MaskSAM: Towards Auto-prompt SAM with Mask Classification for Medical Image Segmentation

Mar 21, 2024

Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

Abstract:Segment Anything Model~(SAM), a prompt-driven foundation model for natural image segmentation, has demonstrated impressive zero-shot performance. However, SAM does not work when directly applied to medical image segmentation tasks, since SAM lacks the functionality to predict semantic labels for predicted masks and needs to provide extra prompts, such as points or boxes, to segment target regions. Meanwhile, there is a huge gap between 2D natural images and 3D medical images, so the performance of SAM is imperfect for medical image segmentation tasks. Following the above issues, we propose MaskSAM, a novel mask classification prompt-free SAM adaptation framework for medical image segmentation. We design a prompt generator combined with the image encoder in SAM to generate a set of auxiliary classifier tokens, auxiliary binary masks, and auxiliary bounding boxes. Each pair of auxiliary mask and box prompts, which can solve the requirements of extra prompts, is associated with class label predictions by the sum of the auxiliary classifier token and the learnable global classifier tokens in the mask decoder of SAM to solve the predictions of semantic labels. Meanwhile, we design a 3D depth-convolution adapter for image embeddings and a 3D depth-MLP adapter for prompt embeddings. We inject one of them into each transformer block in the image encoder and mask decoder to enable pre-trained 2D SAM models to extract 3D information and adapt to 3D medical images. Our method achieves state-of-the-art performance on AMOS2022, 90.52% Dice, which improved by 2.7% compared to nnUNet. Our method surpasses nnUNet by 1.7% on ACDC and 1.0% on Synapse datasets.

Via

Access Paper or Ask Questions

Online Multi-spectral Neuron Tracing

Mar 10, 2024

Bin Duan, Yuzhang Shang, Dawen Cai, Yan Yan

Figure 1 for Online Multi-spectral Neuron Tracing

Figure 2 for Online Multi-spectral Neuron Tracing

Figure 3 for Online Multi-spectral Neuron Tracing

Figure 4 for Online Multi-spectral Neuron Tracing

Abstract:In this paper, we propose an online multi-spectral neuron tracing method with uniquely designed modules, where no offline training are required. Our method is trained online to update our enhanced discriminative correlation filter to conglutinate the tracing process. This distinctive offline-training-free schema differentiates us from other training-dependent tracing approaches like deep learning methods since no annotation is needed for our method. Besides, compared to other tracing methods requiring complicated set-up such as for clustering and graph multi-cut, our approach is much easier to be applied to new images. In fact, it only needs a starting bounding box of the tracing neuron, significantly reducing users' configuration effort. Our extensive experiments show that our training-free and easy-configured methodology allows fast and accurate neuron reconstructions in multi-spectral images.

Via

Access Paper or Ask Questions

Towards Saner Deep Image Registration

Jul 24, 2023

Bin Duan, Ming Zhong, Yan Yan

Figure 1 for Towards Saner Deep Image Registration

Figure 2 for Towards Saner Deep Image Registration

Figure 3 for Towards Saner Deep Image Registration

Figure 4 for Towards Saner Deep Image Registration

Abstract:With recent advances in computing hardware and surges of deep-learning architectures, learning-based deep image registration methods have surpassed their traditional counterparts, in terms of metric performance and inference time. However, these methods focus on improving performance measurements such as Dice, resulting in less attention given to model behaviors that are equally desirable for registrations, especially for medical imaging. This paper investigates these behaviors for popular learning-based deep registrations under a sanity-checking microscope. We find that most existing registrations suffer from low inverse consistency and nondiscrimination of identical pairs due to overly optimized image similarities. To rectify these behaviors, we propose a novel regularization-based sanity-enforcer method that imposes two sanity checks on the deep model to reduce its inverse consistency errors and increase its discriminative power simultaneously. Moreover, we derive a set of theoretical guarantees for our sanity-checked image registration method, with experimental results supporting our theoretical findings and their effectiveness in increasing the sanity of models without sacrificing any performance. Our code and models are available at https://github.com/tuffr5/Saner-deep-registration.

* ICCV 2023

Via

Access Paper or Ask Questions

Query-Utterance Attention with Joint modeling for Query-Focused Meeting Summarization

Mar 08, 2023

Xingxian Liu, Bin Duan, Bo Xiao, Yajing Xu

Abstract:Query-focused meeting summarization (QFMS) aims to generate summaries from meeting transcripts in response to a given query. Previous works typically concatenate the query with meeting transcripts and implicitly model the query relevance only at the token level with attention mechanism. However, due to the dilution of key query-relevant information caused by long meeting transcripts, the original transformer-based model is insufficient to highlight the key parts related to the query. In this paper, we propose a query-aware framework with joint modeling token and utterance based on Query-Utterance Attention. It calculates the utterance-level relevance to the query with a dense retrieval module. Then both token-level query relevance and utterance-level query relevance are combined and incorporated into the generation process with attention mechanism explicitly. We show that the query relevance of different granularities contributes to generating a summary more related to the query. Experimental results on the QMSum dataset show that the proposed model achieves new state-of-the-art performance.

* icassp 2023

Via

Access Paper or Ask Questions

Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application

Jan 27, 2023

Bin Duan, Keshav Bhandari, Gaowen Liu, Yan Yan

$Figure 1 for Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application$

$Figure 2 for Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application$

$Figure 3 for Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application$

$Figure 4 for Optical Flow Estimation in 360$^\circ$ Videos: Dataset, Model and Application$

Abstract:Optical flow estimation has been a long-lasting and fundamental problem in the computer vision community. However, despite the advances of optical flow estimation in perspective videos, the 360$^\circ$ videos counterpart remains in its infancy, primarily due to the shortage of benchmark datasets and the failure to accommodate the omnidirectional nature of 360$^\circ$ videos. We propose the first perceptually realistic 360$^\circ$ filed-of-view video benchmark dataset, namely FLOW360, with 40 different videos and 4,000 video frames. We then conduct comprehensive characteristic analysis and extensive comparisons with existing datasets, manifesting FLOW360's perceptual realism, uniqueness, and diversity. Moreover, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF) estimation, which is trained in a contrastive manner via a hybrid loss that combines siamese contrastive and optical flow losses. By training the model on random rotations of the input omnidirectional frames, our proposed contrastive scheme accommodates the omnidirectional nature of optical flow estimation in 360$^\circ$ videos, resulting in significantly reduced prediction errors. The learning scheme is further proven to be efficient by expanding our siamese learning scheme and omnidirectional optical flow estimation to the egocentric activity recognition task, where the classification accuracy is boosted up to $\sim$26%. To summarize, we study the optical flow estimation in 360$^\circ$ videos problem from perspectives of the benchmark dataset, learning model, and also practical application. The FLOW360 dataset and code are available at https://siamlof.github.io.

* 20 pages, 14 figures, conference extension. arXiv admin note: substantial text overlap with arXiv:2208.03620

Via

Access Paper or Ask Questions

Learning Omnidirectional Flow in 360-degree Video via Siamese Representation

Aug 07, 2022

Keshav Bhandari, Bin Duan, Gaowen Liu, Hugo Latapie, Ziliang Zong, Yan Yan

Abstract:Optical flow estimation in omnidirectional videos faces two significant issues: the lack of benchmark datasets and the challenge of adapting perspective video-based methods to accommodate the omnidirectional nature. This paper proposes the first perceptually natural-synthetic omnidirectional benchmark dataset with a 360-degree field of view, FLOW360, with 40 different videos and 4,000 video frames. We conduct comprehensive characteristic analysis and comparisons between our dataset and existing optical flow datasets, which manifest perceptual realism, uniqueness, and diversity. To accommodate the omnidirectional nature, we present a novel Siamese representation Learning framework for Omnidirectional Flow (SLOF). We train our network in a contrastive manner with a hybrid loss function that combines contrastive loss and optical flow loss. Extensive experiments verify the proposed framework's effectiveness and show up to 40% performance improvement over the state-of-the-art approaches. Our FLOW360 dataset and code are available at https://siamlof.github.io/.

* Accepted to ECCV22

Via

Access Paper or Ask Questions

MLP-GAN for Brain Vessel Image Segmentation

Jul 17, 2022

Bin Xie, Hao Tang, Bin Duan, Dawen Cai, Yan Yan

Figure 1 for MLP-GAN for Brain Vessel Image Segmentation

Figure 2 for MLP-GAN for Brain Vessel Image Segmentation

Figure 3 for MLP-GAN for Brain Vessel Image Segmentation

Figure 4 for MLP-GAN for Brain Vessel Image Segmentation

Abstract:Brain vessel image segmentation can be used as a promising biomarker for better prevention and treatment of different diseases. One successful approach is to consider the segmentation as an image-to-image translation task and perform a conditional Generative Adversarial Network (cGAN) to learn a transformation between two distributions. In this paper, we present a novel multi-view approach, MLP-GAN, which splits a 3D volumetric brain vessel image into three different dimensional 2D images (i.e., sagittal, coronal, axial) and then feed them into three different 2D cGANs. The proposed MLP-GAN not only alleviates the memory issue which exists in the original 3D neural networks but also retains 3D spatial information. Specifically, we utilize U-Net as the backbone for our generator and redesign the pattern of skip connection integrated with the MLP-Mixer which has attracted lots of attention recently. Our model obtains the ability to capture cross-patch information to learn global information with the MLP-Mixer. Extensive experiments are performed on the public brain vessel dataset that show our MLP-GAN outperforms other state-of-the-art methods. We release our code at https://github.com/bxie9/MLP-GAN

* 7 pages. Accepted to ICPR 2022

Via

Access Paper or Ask Questions