Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yan Xie

Higher-order Structure Boosts Link Prediction on Temporal Graphs

May 21, 2025

Jingzhe Liu, Zhigang Hua, Yan Xie, Bingheng Li, Harry Shomer, Yu Song, Kaveh Hassani, Jiliang Tang

Abstract:Temporal Graph Neural Networks (TGNNs) have gained growing attention for modeling and predicting structures in temporal graphs. However, existing TGNNs primarily focus on pairwise interactions while overlooking higher-order structures that are integral to link formation and evolution in real-world temporal graphs. Meanwhile, these models often suffer from efficiency bottlenecks, further limiting their expressive power. To tackle these challenges, we propose a Higher-order structure Temporal Graph Neural Network, which incorporates hypergraph representations into temporal graph learning. In particular, we develop an algorithm to identify the underlying higher-order structures, enhancing the model's ability to capture the group interactions. Furthermore, by aggregating multiple edge features into hyperedge representations, HTGN effectively reduces memory cost during training. We theoretically demonstrate the enhanced expressiveness of our approach and validate its effectiveness and efficiency through extensive experiments on various real-world temporal graphs. Experimental results show that HTGN achieves superior performance on dynamic link prediction while reducing memory costs by up to 50\% compared to existing methods.

Via

Access Paper or Ask Questions

Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models

May 12, 2025

Yan Xie, Zequn Zeng, Hao Zhang, Yucheng Ding, Yi Wang, Zhengjue Wang, Bo Chen, Hongwei Liu

Abstract:Concept Bottleneck Models (CBMs) try to make the decision-making process transparent by exploring an intermediate concept space between the input image and the output prediction. Existing CBMs just learn coarse-grained relations between the whole image and the concepts, less considering local image information, leading to two main drawbacks: i) they often produce spurious visual-concept relations, hence decreasing model reliability; and ii) though CBMs could explain the importance of every concept to the final prediction, it is still challenging to tell which visual region produces the prediction. To solve these problems, this paper proposes a Disentangled Optimal Transport CBM (DOT-CBM) framework to explore fine-grained visual-concept relations between local image patches and concepts. Specifically, we model the concept prediction process as a transportation problem between the patches and concepts, thereby achieving explicit fine-grained feature alignment. We also incorporate orthogonal projection losses within the modality to enhance local feature disentanglement. To further address the shortcut issues caused by statistical biases in the data, we utilize the visual saliency map and concept label statistics as transportation priors. Thus, DOT-CBM can visualize inversion heatmaps, provide more reliable concept predictions, and produce more accurate class predictions. Comprehensive experiments demonstrate that our proposed DOT-CBM achieves SOTA performance on several tasks, including image classification, local part detection and out-of-distribution generalization.

* CVPR 2025

Via

Access Paper or Ask Questions

HICEScore: A Hierarchical Metric for Image Captioning Evaluation

Jul 26, 2024

Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen

Figure 1 for HICEScore: A Hierarchical Metric for Image Captioning Evaluation

Figure 2 for HICEScore: A Hierarchical Metric for Image Captioning Evaluation

Figure 3 for HICEScore: A Hierarchical Metric for Image Captioning Evaluation

Figure 4 for HICEScore: A Hierarchical Metric for Image Captioning Evaluation

Abstract:Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metrics have been proven effective via CLIP cross-modality similarity. Nonetheless, CLIP-based metrics, constrained by their solution of global image-text compatibility, often have a deficiency in detecting local textual hallucinations and are insensitive to small visual objects. Besides, their single-scale designs are unable to provide an interpretable evaluation process such as pinpointing the position of caption mistakes and identifying visual regions that have not been described. To move forward, we propose a novel reference-free metric for image captioning evaluation, dubbed Hierarchical Image Captioning Evaluation Score (HICE-S). By detecting local visual regions and textual phrases, HICE-S builds an interpretable hierarchical scoring mechanism, breaking through the barriers of the single-scale structure of existing reference-free metrics. Comprehensive experiments indicate that our proposed metric achieves the SOTA performance on several benchmarks, outperforming existing reference-free metrics like CLIP-S and PAC-S, and reference-based metrics like METEOR and CIDEr. Moreover, several case studies reveal that the assessment process of HICE-S on detailed captions closely resembles interpretable human judgments.Our code is available at https://github.com/joeyz0z/HICE.

* Accepted by ACM MM2024

Via

Access Paper or Ask Questions

A Scalable and Effective Alternative to Graph Transformers

Jun 17, 2024

Kaan Sancak, Zhigang Hua, Jin Fang, Yan Xie, Andrey Malevich, Bo Long, Muhammed Fatih Balin, Ümit V. Çatalyürek

Figure 1 for A Scalable and Effective Alternative to Graph Transformers

Figure 2 for A Scalable and Effective Alternative to Graph Transformers

Figure 3 for A Scalable and Effective Alternative to Graph Transformers

Figure 4 for A Scalable and Effective Alternative to Graph Transformers

Abstract:Graph Neural Networks (GNNs) have shown impressive performance in graph representation learning, but they face challenges in capturing long-range dependencies due to their limited expressive power. To address this, Graph Transformers (GTs) were introduced, utilizing self-attention mechanism to effectively model pairwise node relationships. Despite their advantages, GTs suffer from quadratic complexity w.r.t. the number of nodes in the graph, hindering their applicability to large graphs. In this work, we present Graph-Enhanced Contextual Operator (GECO), a scalable and effective alternative to GTs that leverages neighborhood propagation and global convolutions to effectively capture local and global dependencies in quasilinear time. Our study on synthetic datasets reveals that GECO reaches 169x speedup on a graph with 2M nodes w.r.t. optimized attention. Further evaluations on diverse range of benchmarks showcase that GECO scales to large graphs where traditional GTs often face memory and time limitations. Notably, GECO consistently achieves comparable or superior quality compared to baselines, improving the SOTA up to 4.5%, and offering a scalable and effective solution for large-scale graph learning.

* Under submission

Via

Access Paper or Ask Questions

Degree of Irrationality: Sentiment and Implied Volatility Surface

May 20, 2024

Jiahao Weng, Yan Xie

Abstract:In this study, we constructed daily high-frequency sentiment data and used the VAR method to attempt to predict the next day's implied volatility surface. We utilized 630,000 text data entries from the East Money Stock Forum from 2014 to 2023 and employed deep learning methods such as BERT and LSTM to build daily market sentiment indicators. By applying FFT and EMD methods for sentiment decomposition, we found that high-frequency sentiment had a stronger correlation with at-the-money (ATM) options' implied volatility, while low-frequency sentiment was more strongly correlated with deep out-of-the-money (DOTM) options' implied volatility. Further analysis revealed that the shape of the implied volatility surface contains richer market sentiment information beyond just market panic. We demonstrated that incorporating this sentiment information can improve the accuracy of implied volatility surface predictions.

* 21 pages, 8 figures

Via

Access Paper or Ask Questions

VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Mar 24, 2024

Dongqi Fu, Zhigang Hua, Yan Xie, Jin Fang, Si Zhang, Kaan Sancak, Hao Wu, Andrey Malevich, Jingrui He, Bo Long

Figure 1 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 2 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 3 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Figure 4 for VCR-Graphormer: A Mini-batch Graph Transformer via Virtual Connections

Abstract:Graph transformer has been proven as an effective graph learning method for its adoption of attention mechanism that is capable of capturing expressive representations from complex topological and feature information of graphs. Graph transformer conventionally performs dense attention (or global attention) for every pair of nodes to learn node representation vectors, resulting in quadratic computational costs that are unaffordable for large-scale graph data. Therefore, mini-batch training for graph transformers is a promising direction, but limited samples in each mini-batch can not support effective dense attention to encode informative representations. Facing this bottleneck, (1) we start by assigning each node a token list that is sampled by personalized PageRank (PPR) and then apply standard multi-head self-attention only on this list to compute its node representations. This PPR tokenization method decouples model training from complex graph topological information and makes heavy feature engineering offline and independent, such that mini-batch training of graph transformers is possible by loading each node's token list in batches. We further prove this PPR tokenization is viable as a graph convolution network with a fixed polynomial filter and jumping knowledge. However, only using personalized PageRank may limit information carried by a token list, which could not support different graph inductive biases for model training. To this end, (2) we rewire graphs by introducing multiple types of virtual connections through structure- and content-based super nodes that enable PPR tokenization to encode local and global contexts, long-range interaction, and heterophilous information into each node's token list, and then formalize our Virtual Connection Ranking based Graph Transformer (VCR-Graphormer).

Via

Access Paper or Ask Questions

MeaCap: Memory-Augmented Zero-shot Image Captioning

Mar 06, 2024

Zequn Zeng, Yan Xie, Hao Zhang, Chiyu Chen, Zhengjue Wang, Bo Chen

Abstract:Zero-shot image captioning (IC) without well-paired image-text data can be divided into two categories, training-free and text-only-training. Generally, these two types of methods realize zero-shot IC by integrating pretrained vision-language models like CLIP for image-text similarity evaluation and a pre-trained language model (LM) for caption generation. The main difference between them is whether using a textual corpus to train the LM. Though achieving attractive performance w.r.t. some metrics, existing methods often exhibit some common drawbacks. Training-free methods tend to produce hallucinations, while text-only-training often lose generalization capability. To move forward, in this paper, we propose a novel Memory-Augmented zero-shot image Captioning framework (MeaCap). Specifically, equipped with a textual memory, we introduce a retrieve-then-filter module to get key concepts that are highly related to the image. By deploying our proposed memory-augmented visual-related fusion score in a keywords-to-sentence LM, MeaCap can generate concept-centered captions that keep high consistency with the image with fewer hallucinations and more world-knowledge. The framework of MeaCap achieves the state-of-the-art performance on a series of zero-shot IC settings. Our code is available at https://github.com/joeyz0z/MeaCap.

* Accepted by CVPR2024

Via

Access Paper or Ask Questions

Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Aug 09, 2021

Yan Xie, Jiajun Wang, Hao Dong, Xiaoyu Ren, Liqun Huang, Mingguo Zhao

Figure 1 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 2 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 3 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 4 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Abstract:Dynamic balancing under uncertain disturbances is important for a humanoid robot, which requires a good capability of coordinating the entire body redundancy to execute multi tasks. Whole-body control (WBC) based on hierarchical optimization has been generally accepted and utilized in torque-controlled robots. A good hierarchy is the prerequisite for WBC and can be predefined according to prior knowledge. However, the real-time computation would be problematic in the physical applications considering the computational complexity of WBC. For robots with proprioceptive actuation, the joint friction in gear reducer would also degrade the torque tracking performance. In our paper, a reasonable hierarchy of tasks and constraints is first customized for robot dynamic balancing. Then a real-time WBC is implemented via a computationally efficient WBC software. Such a method is solved on a modular master control system UBTMaster characterized by the real-time communication and powerful computing capability. After the joint friction being well covered by the model identification, extensive experiments on various balancing scenarios are conducted on a humanoid Walker3 with proprioceptive actuation. The robot shows an outstanding balance performance even under external impulses as well as the two feet of the robot suffering the inclination and shift disturbances independently. The results demonstrate that with the strict hierarchy, real-time computation and joint friction being handled carefully, the robot with proprioceptive actuation can manage the dynamic physical interactions with the unstructured environments well.

* journal

Via

Access Paper or Ask Questions

Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Jun 18, 2021

Ruiqing Ding, Yu Zhou, Jie Xu, Yan Xie, Qiqiang Liang, He Ren, Yixuan Wang, Yanlin Chen, Leye Wang, Man Huang

Figure 1 for Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Figure 2 for Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Figure 3 for Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Figure 4 for Semi-supervised Optimal Transport with Self-paced Ensemble for Cross-hospital Sepsis Early Detection

Abstract:The utilization of computer technology to solve problems in medical scenarios has attracted considerable attention in recent years, which still has great potential and space for exploration. Among them, machine learning has been widely used in the prediction, diagnosis and even treatment of Sepsis. However, state-of-the-art methods require large amounts of labeled medical data for supervised learning. In real-world applications, the lack of labeled data will cause enormous obstacles if one hospital wants to deploy a new Sepsis detection system. Different from the supervised learning setting, we need to use known information (e.g., from another hospital with rich labeled data) to help build a model with acceptable performance, i.e., transfer learning. In this paper, we propose a semi-supervised optimal transport with self-paced ensemble framework for Sepsis early detection, called SPSSOT, to transfer knowledge from the other that has rich labeled data. In SPSSOT, we first extract the same clinical indicators from the source domain (e.g., hospital with rich labeled data) and the target domain (e.g., hospital with little labeled data), then we combine the semi-supervised domain adaptation based on optimal transport theory with self-paced under-sampling to avoid a negative transfer possibly caused by covariate shift and class imbalance. On the whole, SPSSOT is an end-to-end transfer learning method for Sepsis early detection which can automatically select suitable samples from two domains respectively according to the number of iterations and align feature space of two domains. Extensive experiments on two open clinical datasets demonstrate that comparing with other methods, our proposed SPSSOT, can significantly improve the AUC values with only 1% labeled data in the target domain in two transfer learning scenarios, MIMIC $rightarrow$ Challenge and Challenge $rightarrow$ MIMIC.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions