Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingwei Song

Code-Vision: Evaluating Multimodal LLMs Logic Understanding and Code Generation Capabilities

Feb 17, 2025

Hanbin Wang, Xiaoxuan Zhou, Zhipeng Xu, Keyuan Cheng, Yuxin Zuo, Kai Tian, Jingwei Song, Junting Lu, Wenhui Hu, Xueyang Liu

Abstract:This paper introduces Code-Vision, a benchmark designed to evaluate the logical understanding and code generation capabilities of Multimodal Large Language Models (MLLMs). It challenges MLLMs to generate a correct program that fulfills specific functionality requirements based on a given flowchart, which visually represents the desired algorithm or process. Code-Vision comprises three subsets: HumanEval-V, Algorithm, and MATH, which evaluate MLLMs' coding abilities across basic programming, algorithmic, and mathematical problem-solving domains. Our experiments evaluate 12 MLLMs on Code-Vision. Experimental results demonstrate that there is a large performance difference between proprietary and open-source models. On Hard problems, GPT-4o can achieve 79.3% pass@1, but the best open-source model only achieves 15%. Further experiments reveal that Code-Vision can pose unique challenges compared to other multimodal reasoning benchmarks MMCode and MathVista. We also explore the reason for the poor performance of the open-source models. All data and codes are available at https://github.com/wanghanbinpanda/CodeVision.

* 15 pages

Via

Access Paper or Ask Questions

DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Sep 27, 2024

Jingwei Song, Maani Ghaffari

Figure 1 for DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Figure 2 for DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Figure 3 for DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Figure 4 for DynaWeightPnP: Toward global real-time 3D-2D solver in PnP without correspondences

Abstract:This paper addresses a special Perspective-n-Point (PnP) problem: estimating the optimal pose to align 3D and 2D shapes in real-time without correspondences, termed as correspondence-free PnP. While several studies have focused on 3D and 2D shape registration, achieving both real-time and accurate performance remains challenging. This study specifically targets the 3D-2D geometric shape registration tasks, applying the recently developed Reproducing Kernel Hilbert Space (RKHS) to address the "big-to-small" issue. An iterative reweighted least squares method is employed to solve the RKHS-based formulation efficiently. Moreover, our work identifies a unique and interesting observability issue in correspondence-free PnP: the numerical ambiguity between rotation and translation. To address this, we proposed DynaWeightPnP, introducing a dynamic weighting sub-problem and an alternative searching algorithm designed to enhance pose estimation and alignment accuracy. Experiments were conducted on a typical case, that is, a 3D-2D vascular centerline registration task within Endovascular Image-Guided Interventions (EIGIs). Results demonstrated that the proposed algorithm achieves registration processing rates of 60 Hz (without post-refinement) and 31 Hz (with post-refinement) on modern single-core CPUs, with competitive accuracy comparable to existing methods. These results underscore the suitability of DynaWeightPnP for future robot navigation tasks like EIGIs.

Via

Access Paper or Ask Questions

SLAM assisted 3D tracking system for laparoscopic surgery

Sep 18, 2024

Jingwei Song, Ray Zhang, Wenwei Zhang, Hao Zhou, Maani Ghaffari

Figure 1 for SLAM assisted 3D tracking system for laparoscopic surgery

Figure 2 for SLAM assisted 3D tracking system for laparoscopic surgery

Figure 3 for SLAM assisted 3D tracking system for laparoscopic surgery

Figure 4 for SLAM assisted 3D tracking system for laparoscopic surgery

Abstract:A major limitation of minimally invasive surgery is the difficulty in accurately locating the internal anatomical structures of the target organ due to the lack of tactile feedback and transparency. Augmented reality (AR) offers a promising solution to overcome this challenge. Numerous studies have shown that combining learning-based and geometric methods can achieve accurate preoperative and intraoperative data registration. This work proposes a real-time monocular 3D tracking algorithm for post-registration tasks. The ORB-SLAM2 framework is adopted and modified for prior-based 3D tracking. The primitive 3D shape is used for fast initialization of the monocular SLAM. A pseudo-segmentation strategy is employed to separate the target organ from the background for tracking purposes, and the geometric prior of the 3D shape is incorporated as an additional constraint in the pose graph. Experiments from in-vivo and ex-vivo tests demonstrate that the proposed 3D tracking system provides robust 3D tracking and effectively handles typical challenges such as fast motion, out-of-field-of-view scenarios, partial visibility, and "organ-background" relative motion.

* Demo: https://youtu.be/B1xZW8bj3cM

Via

Access Paper or Ask Questions

RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking

Mar 02, 2024

Ray Zhang, Jingwei Song, Xiang Gao, Junzhe Wu, Tianyi Liu, Jinyuan Zhang, Ryan Eustice, Maani Ghaffari

Figure 1 for RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking

Figure 2 for RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking

Figure 3 for RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking

Figure 4 for RKHS-BA: A Semantic Correspondence-Free Multi-View Registration Framework with Global Tracking

Abstract:This work reports a novel Bundle Adjustment (BA) formulation using a Reproducing Kernel Hilbert Space (RKHS) representation called RKHS-BA. The proposed formulation is correspondence-free, enables the BA to use RGB-D/LiDAR and semantic labels in the optimization directly, and provides a generalization for the photometric loss function commonly used in direct methods. RKHS-BA can incorporate appearance and semantic labels within a continuous spatial-semantic functional representation that does not require optimization via image pyramids. We demonstrate its applications in sliding-window odometry and global LiDAR mapping, which show highly robust performance in extremely challenging scenes and the best trade-off of generalization and accuracy.

* 16 pages, 12 figures, technical report under review

Via

Access Paper or Ask Questions

BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Dec 25, 2023

Jingwei Song, Ray Zhang, Qiuchen Zhu, Jianyu Lin, Maani Ghaffari

Figure 1 for BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Figure 2 for BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Figure 3 for BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Figure 4 for BDIS-SLAM: A lightweight CPU-based dense stereo SLAM for surgery

Abstract:Purpose: Common dense stereo Simultaneous Localization and Mapping (SLAM) approaches in Minimally Invasive Surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). Methods: A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. Results: Experiments on in-vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typical endoscopy/colonoscopy scenarios. BDIS-SLAM only consumes around an additional 12% time compared with the backbone ORB-SLAM2. Although our lightweight BDIS-SLAM simplifies the process by ignoring deformation and fusion procedures, it can provide a usable dense mapping for modern MIS on computationally constrained devices. Conclusion: The proposed BDIS-SLAM is a lightweight stereo dense SLAM system for MIS. It achieves 30 Hz on a modern single-core CPU in typical endoscopy/colonoscopy scenarios (image size around 640*480). BDIS-SLAM provides a low-cost solution for dense mapping in MIS and has the potential to be applied in surgical robots and AR systems.

* This paper has been accepted by International Journal of Computer Assisted Radiology and Surgery. Code is available at https://github.com/JingweiSong/BDIS-SLAM

Via

Access Paper or Ask Questions

Iterative PnP and its application in 3D-2D vascular image registration for robot navigation

Oct 19, 2023

Jingwei Song, Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari

Abstract:This paper reports on a new real-time robot-centered 3D-2D vascular image alignment algorithm, which is robust to outliers and can align nonrigid shapes. Few works have managed to achieve both real-time and accurate performance for vascular intervention robots. This work bridges high-accuracy 3D-2D registration techniques and computational efficiency requirements in intervention robot applications. We categorize centerline-based vascular 3D-2D image registration problems as an iterative Perspective-n-Point (PnP) problem and propose to use the Levenberg-Marquardt solver on the Lie manifold. Then, the recently developed Reproducing Kernel Hilbert Space (RKHS) algorithm is introduced to overcome the ``big-to-small'' problem in typical robotic scenarios. Finally, an iterative reweighted least squares is applied to solve RKHS-based formulation efficiently. Experiments indicate that the proposed algorithm processes registration over 50 Hz (rigid) and 20 Hz (nonrigid) and obtains competing registration accuracy similar to other works. Results indicate that our Iterative PnP is suitable for future vascular intervention robot applications.

* Submitted to ICRA 2024

Via

Access Paper or Ask Questions

Optical flow-based vascular respiratory motion compensation

Aug 31, 2023

Keke Yang, Zheng Zhang, Meng Li, Tuoyu Cao, Maani Ghaffari, Jingwei Song

Abstract:This paper develops a new vascular respiratory motion compensation algorithm, Motion-Related Compensation (MRC), to conduct vascular respiratory motion compensation by extrapolating the correlation between invisible vascular and visible non-vascular. Robot-assisted vascular intervention can significantly reduce the radiation exposure of surgeons. In robot-assisted image-guided intervention, blood vessels are constantly moving/deforming due to respiration, and they are invisible in the X-ray images unless contrast agents are injected. The vascular respiratory motion compensation technique predicts 2D vascular roadmaps in live X-ray images. When blood vessels are visible after contrast agents injection, vascular respiratory motion compensation is conducted based on the sparse Lucas-Kanade feature tracker. An MRC model is trained to learn the correlation between vascular and non-vascular motions. During the intervention, the invisible blood vessels are predicted with visible tissues and the trained MRC model. Moreover, a Gaussian-based outlier filter is adopted for refinement. Experiments on in-vivo data sets show that the proposed method can yield vascular respiratory motion compensation in 0.032 sec, with an average error 1.086 mm. Our real-time and accurate vascular respiratory motion compensation approach contributes to modern vascular intervention and surgical robots.

* This manuscript has been accepted by IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

May 06, 2022

Jingwei Song, Qiuchen Zhu, Jianyu Lin, Maani Ghaffari

Figure 1 for BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

Figure 2 for BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

Figure 3 for BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

Figure 4 for BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo Surgical Image Matching

Abstract:In stereoscope-based Minimally Invasive Surgeries (MIS), dense stereo matching plays an indispensable role in 3D shape recovery, AR, VR, and navigation tasks. Although numerous Deep Neural Network (DNN) approaches are proposed, the conventional prior-free approaches are still popular in the industry because of the lack of open-source annotated data set and the limitation of the task-specific pre-trained DNNs. Among the prior-free stereo matching algorithms, there is no successful real-time algorithm in none GPU environment for MIS. This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks. We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images. Meanwhile, it achieves slightly better accuracy than the popular ELAS. The patch-based fast disparity searching algorithm is adopted for the rectified stereo images. A coarse-to-fine Bayesian probability and a spatial Gaussian mixed model were proposed to evaluate the patch probability at different scales. An optional probability density function estimation algorithm was adopted to quantify the prediction variance. Extensive experiments demonstrated the proposed method's capability to handle ambiguities introduced by the textureless surfaces and the photometric inconsistency from the non-Lambertian reflectance and dark illumination. The estimated probability managed to balance the confidences of the patches for stereo images at different scales. It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster. The code and the synthetic data sets are available at https://github.com/JingweiSong/BDIS-v2.

Via

Access Paper or Ask Questions

Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Jan 05, 2022

Jingwei Song, Mitesh Patel, Maani Ghaffari

Figure 1 for Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Figure 2 for Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Figure 3 for Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Figure 4 for Fusing Convolutional Neural Network and Geometric Constraint for Image-based Indoor Localization

Abstract:This paper proposes a new image-based localization framework that explicitly localizes the camera/robot by fusing Convolutional Neural Network (CNN) and sequential images' geometric constraints. The camera is localized using a single or few observed images and training images with 6-degree-of-freedom pose labels. A Siamese network structure is adopted to train an image descriptor network, and the visually similar candidate image in the training set is retrieved to localize the testing image geometrically. Meanwhile, a probabilistic motion model predicts the pose based on a constant velocity assumption. The two estimated poses are finally fused using their uncertainties to yield an accurate pose prediction. This method leverages the geometric uncertainty and is applicable in indoor scenarios predominated by diffuse illumination. Experiments on simulation and real data sets demonstrate the efficiency of our proposed method. The results further show that combining the CNN-based framework with geometric constraint achieves better accuracy when compared with CNN-only methods, especially when the training data size is small.

* Accepted by IEEE robotics and automation letters

Via

Access Paper or Ask Questions

Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Jun 14, 2021

Jingwei Song, Qiuchen Zhu, Jianyu Lin, Maani Ghaffari

Figure 1 for Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Figure 2 for Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Figure 3 for Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Figure 4 for Bayesian dense inverse searching algorithm for real-time stereo matching in minimally invasive surgery

Abstract:This paper reports a CPU-level real-time stereo matching method for surgical images (10 Hz on 640 * 480 image with a single core of i5-9400). The proposed method is built on the fast ''dense inverse searching'' algorithm, which estimates the disparity of the stereo images. The overlapping image patches (arbitrary squared image segment) from the images at different scales are aligned based on the photometric consistency presumption. We propose a Bayesian framework to evaluate the probability of the optimized patch disparity at different scales. Moreover, we introduce a spatial Gaussian mixed probability distribution to address the pixel-wise probability within the patch. In-vivo and synthetic experiments show that our method can handle ambiguities resulted from the textureless surfaces and the photometric inconsistency caused by the Lambertian reflectance. Our Bayesian method correctly balances the probability of the patch for stereo images at different scales. Experiments indicate that the estimated depth has higher accuracy and fewer outliers than the baseline methods in the surgical scenario.

Via

Access Paper or Ask Questions