Abstract:Existing benchmarks like NLGraph and GraphQA evaluate LLMs on graphs by focusing mainly on pairwise relationships, overlooking the high-order correlations found in real-world data. Hypergraphs, which can model complex beyond-pairwise relationships, offer a more robust framework but are still underexplored in the context of LLMs. To address this gap, we introduce LLM4Hypergraph, the first comprehensive benchmark comprising 21,500 problems across eight low-order, five high-order, and two isomorphism tasks, utilizing both synthetic and real-world hypergraphs from citation networks and protein structures. We evaluate six prominent LLMs, including GPT-4o, demonstrating our benchmark's effectiveness in identifying model strengths and weaknesses. Our specialized prompting framework incorporates seven hypergraph languages and introduces two novel techniques, Hyper-BAG and Hyper-COT, which enhance high-order reasoning and achieve an average 4% (up to 9%) performance improvement on structure classification tasks. This work establishes a foundational testbed for integrating hypergraph computational capabilities into LLMs, advancing their comprehension.
Abstract:We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation. This enables the model to acquire both semantic and structural information, advancing beyond conventional feature-focused learning. Hyper-YOLO incorporates the proposed Mixed Aggregation Network (MANet) in its backbone for enhanced feature extraction and introduces the Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net) in its neck. HyperC2Net operates across five scales and breaks free from traditional grid structures, allowing for sophisticated high-order interactions across levels and positions. This synergy of components positions Hyper-YOLO as a state-of-the-art architecture in various scale models, as evidenced by its superior performance on the COCO dataset. Specifically, Hyper-YOLO-N significantly outperforms the advanced YOLOv8-N and YOLOv9-T with 12\% $\text{AP}^{val}$ and 9\% $\text{AP}^{val}$ improvements. The source codes are at ttps://github.com/iMoonLab/Hyper-YOLO.
Abstract:Hypergraph Neural Networks (HGNNs) have recently attracted much attention and exhibited satisfactory performance due to their superiority in high-order correlation modeling. However, it is noticed that the high-order modeling capability of hypergraph also brings increased computation complexity, which hinders its practical industrial deployment. In practice, we find that one key barrier to the efficient deployment of HGNNs is the high-order structural dependencies during inference. In this paper, we propose to bridge the gap between the HGNNs and inference-efficient Multi-Layer Perceptron (MLPs) to eliminate the hypergraph dependency of HGNNs and thus reduce computational complexity as well as improve inference speed. Specifically, we introduce LightHGNN and LightHGNN$^+$ for fast inference with low complexity. LightHGNN directly distills the knowledge from teacher HGNNs to student MLPs via soft labels, and LightHGNN$^+$ further explicitly injects reliable high-order correlations into the student MLPs to achieve topology-aware distillation and resistance to over-smoothing. Experiments on eight hypergraph datasets demonstrate that even without hypergraph dependency, the proposed LightHGNNs can still achieve competitive or even better performance than HGNNs and outperform vanilla MLPs by $16.3$ on average. Extensive experiments on three graph datasets further show the average best performance of our LightHGNNs compared with all other methods. Experiments on synthetic hypergraphs with 5.5w vertices indicate LightHGNNs can run $100\times$ faster than HGNNs, showcasing their ability for latency-sensitive deployments.
Abstract:Multi-modal Magnetic Resonance Imaging (MRI) offers complementary diagnostic information, but some modalities are limited by the long scanning time. To accelerate the whole acquisition process, MRI reconstruction of one modality from highly undersampled k-space data with another fully-sampled reference modality is an efficient solution. However, the misalignment between modalities, which is common in clinic practice, can negatively affect reconstruction quality. Existing deep learning-based methods that account for inter-modality misalignment perform better, but still share two main common limitations: (1) The spatial alignment task is not adaptively integrated with the reconstruction process, resulting in insufficient complementarity between the two tasks; (2) the entire framework has weak interpretability. In this paper, we construct a novel Deep Unfolding Network with Spatial Alignment, termed DUN-SA, to appropriately embed the spatial alignment task into the reconstruction process. Concretely, we derive a novel joint alignment-reconstruction model with a specially designed cross-modal spatial alignment term. By relaxing the model into cross-modal spatial alignment and multi-modal reconstruction tasks, we propose an effective algorithm to solve this model alternatively. Then, we unfold the iterative steps of the proposed algorithm and design corresponding network modules to build DUN-SA with interpretability. Through end-to-end training, we effectively compensate for spatial misalignment using only reconstruction loss, and utilize the progressively aligned reference modality to provide inter-modality prior to improve the reconstruction of the target modality. Comprehensive experiments on three real datasets demonstrate that our method exhibits superior reconstruction performance compared to state-of-the-art methods.
Abstract:Neural network solvers represent an innovative and promising approach for tackling time-fractional partial differential equations by utilizing deep learning techniques. L1 interpolation approximation serves as the standard method for addressing time-fractional derivatives within neural network solvers. However, we have discovered that neural network solvers based on L1 interpolation approximation are unable to fully exploit the benefits of neural networks, and the accuracy of these models is constrained to interpolation errors. In this paper, we present the high-precision Hermite Neural Solver (HNS) for solving time-fractional partial differential equations. Specifically, we first construct a high-order explicit approximation scheme for fractional derivatives using Hermite interpolation techniques, and rigorously analyze its approximation accuracy. Afterward, taking into account the infinitely differentiable properties of deep neural networks, we integrate the high-order Hermite interpolation explicit approximation scheme with deep neural networks to propose the HNS. The experimental results show that HNS achieves higher accuracy than methods based on the L1 scheme for both forward and inverse problems, as well as in high-dimensional scenarios. This indicates that HNS has significantly improved accuracy and flexibility compared to existing L1-based methods, and has overcome the limitations of explicit finite difference approximation methods that are often constrained to function value interpolation. As a result, the HNS is not a simple combination of numerical computing methods and neural networks, but rather achieves a complementary and mutually reinforcing advantages of both approaches. The data and code can be found at \url{https://github.com/hsbhc/HNS}.
Abstract:The isomorphism problem is a fundamental problem in network analysis, which involves capturing both low-order and high-order structural information. In terms of extracting low-order structural information, graph isomorphism algorithms analyze the structural equivalence to reduce the solver space dimension, which demonstrates its power in many applications, such as protein design, chemical pathways, and community detection. For the more commonly occurring high-order relationships in real-life scenarios, the problem of hypergraph isomorphism, which effectively captures these high-order structural relationships, cannot be straightforwardly addressed using graph isomorphism methods. Besides, the existing hypergraph kernel methods may suffer from high memory consumption or inaccurate sub-structure identification, thus yielding sub-optimal performance. In this paper, to address the abovementioned problems, we first propose the hypergraph Weisfiler-Lehman test algorithm for the hypergraph isomorphism test problem by generalizing the Weisfiler-Lehman test algorithm from graphs to hypergraphs. Secondly, based on the presented algorithm, we propose a general hypergraph Weisfieler-Lehman kernel framework and implement two instances, which are Hypergraph Weisfeiler-Lehamn Subtree Kernel and Hypergraph Weisfeiler-Lehamn Hyperedge Kernel. In order to fulfill our research objectives, a comprehensive set of experiments was meticulously designed, including seven graph classification datasets and 12 hypergraph classification datasets. Results on hypergraph classification datasets show significant improvements compared to other typical kernel-based methods, which demonstrates the effectiveness of the proposed methods. In our evaluation, we found that our proposed methods outperform the second-best method in terms of runtime, running over 80 times faster when handling complex hypergraph structures.
Abstract:Deep learning (DL) has proven highly effective for ultrasound-based computer-aided diagnosis (CAD) of breast cancers. In an automaticCAD system, lesion detection is critical for the following diagnosis. However, existing DL-based methods generally require voluminous manually-annotated region of interest (ROI) labels and class labels to train both the lesion detection and diagnosis models. In clinical practice, the ROI labels, i.e. ground truths, may not always be optimal for the classification task due to individual experience of sonologists, resulting in the issue of coarse annotation that limits the diagnosis performance of a CAD model. To address this issue, a novel Two-Stage Detection and Diagnosis Network (TSDDNet) is proposed based on weakly supervised learning to enhance diagnostic accuracy of the ultrasound-based CAD for breast cancers. In particular, all the ROI-level labels are considered as coarse labels in the first training stage, and then a candidate selection mechanism is designed to identify optimallesion areas for both the fully and partially annotated samples. It refines the current ROI-level labels in the fully annotated images and the detected ROIs in the partially annotated samples with a weakly supervised manner under the guidance of class labels. In the second training stage, a self-distillation strategy further is further proposed to integrate the detection network and classification network into a unified framework as the final CAD model for joint optimization, which then further improves the diagnosis performance. The proposed TSDDNet is evaluated on a B-mode ultrasound dataset, and the experimental results show that it achieves the best performance on both lesion detection and diagnosis tasks, suggesting promising application potential.
Abstract:The multi-scale information among the whole slide images (WSIs) is essential for cancer diagnosis. Although the existing multi-scale vision Transformer has shown its effectiveness for learning multi-scale image representation, it still cannot work well on the gigapixel WSIs due to their extremely large image sizes. To this end, we propose a novel Multi-scale Efficient Graph-Transformer (MEGT) framework for WSI classification. The key idea of MEGT is to adopt two independent Efficient Graph-based Transformer (EGT) branches to process the low-resolution and high-resolution patch embeddings (i.e., tokens in a Transformer) of WSIs, respectively, and then fuse these tokens via a multi-scale feature fusion module (MFFM). Specifically, we design an EGT to efficiently learn the local-global information of patch tokens, which integrates the graph representation into Transformer to capture spatial-related information of WSIs. Meanwhile, we propose a novel MFFM to alleviate the semantic gap among different resolution patches during feature fusion, which creates a non-patch token for each branch as an agent to exchange information with another branch by cross-attention. In addition, to expedite network training, a novel token pruning module is developed in EGT to reduce the redundant tokens. Extensive experiments on TCGA-RCC and CAMELYON16 datasets demonstrate the effectiveness of the proposed MEGT.
Abstract:Multi-modal Magnetic Resonance Imaging (MRI) plays an important role in clinical medicine. However, the acquisitions of some modalities, such as the T2-weighted modality, need a long time and they are always accompanied by motion artifacts. On the other hand, the T1-weighted image (T1WI) shares the same underlying information with T2-weighted image (T2WI), which needs a shorter scanning time. Therefore, in this paper we accelerate the acquisition of the T2WI by introducing the auxiliary modality (T1WI). Concretely, we first reconstruct high-quality T2WIs with under-sampled T2WIs. Here, we realize fast T2WI reconstruction by reducing the sampling rate in the k-space. Second, we establish a cross-modal synthesis task to generate the synthetic T2WIs for guiding better T2WI reconstruction. Here, we obtain the synthetic T2WIs by decomposing the whole cross-modal generation mapping into two OT processes, the spatial alignment mapping on the T1 image manifold and the cross-modal synthesis mapping from aligned T1WIs to T2WIs. It overcomes the negative transfer caused by the spatial misalignment. Then, we prove the reconstruction and the synthesis tasks are well complementary. Finally, we compare it with state-of-the-art approaches on an open dataset FastMRI and an in-house dataset to testify the validity of the proposed method.
Abstract:Fast and accurate MRI reconstruction is a key concern in modern clinical practice. Recently, numerous Deep-Learning methods have been proposed for MRI reconstruction, however, they usually fail to reconstruct sharp details from the subsampled k-space data. To solve this problem, we propose a lightweight and accurate Edge Attention MRI Reconstruction Network (EAMRI) to reconstruct images with edge guidance. Specifically, we design an efficient Edge Prediction Network to directly predict accurate edges from the blurred image. Meanwhile, we propose a novel Edge Attention Module (EAM) to guide the image reconstruction utilizing the extracted edge priors, as inspired by the popular self-attention mechanism. EAM first projects the input image and edges into Q_image, K_edge, and V_image, respectively. Then EAM pairs the Q_image with K_edge along the channel dimension, such that 1) it can search globally for the high-frequency image features that are activated by the edge priors; 2) the overall computation burdens are largely reduced compared with the traditional spatial-wise attention. With the help of EAM, the predicted edge priors can effectively guide the model to reconstruct high-quality MR images with accurate edges. Extensive experiments show that our proposed EAMRI outperforms other methods with fewer parameters and can recover more accurate edges.