Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kerui Zhang

Recurrent Temporal Revision Graph Networks

Sep 26, 2023

Yizhou Chen, Anxiang Zeng, Guangda Huzhang, Qingtao Yu, Kerui Zhang, Cao Yuanpeng, Kangle Wu, Han Yu, Zhiming Zhou

Abstract:Temporal graphs offer more accurate modeling of many real-world scenarios than static graphs. However, neighbor aggregation, a critical building block of graph networks, for temporal graphs, is currently straightforwardly extended from that of static graphs. It can be computationally expensive when involving all historical neighbors during such aggregation. In practice, typically only a subset of the most recent neighbors are involved. However, such subsampling leads to incomplete and biased neighbor information. To address this limitation, we propose a novel framework for temporal neighbor aggregation that uses the recurrent neural network with node-wise hidden states to integrate information from all historical neighbors for each node to acquire the complete neighbor information. We demonstrate the superior theoretical expressiveness of the proposed framework as well as its state-of-the-art performance in real-world applications. Notably, it achieves a significant +9.6% improvement on averaged precision in a real-world Ecommerce dataset over existing methods on 2-layer models.

Via

Access Paper or Ask Questions

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Jul 12, 2023

Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp

Figure 1 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Figure 2 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Figure 3 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Figure 4 for Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Abstract:Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.

* 8 pages

Via

Access Paper or Ask Questions

BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

May 15, 2019

Ziyuan Zhao, Kerui Zhang, Xuejie Hao, Jing Tian, Matthew Chin Heng Chua, Li Chen, Xin Xu

Figure 1 for BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Figure 2 for BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Figure 3 for BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Figure 4 for BiRA-Net: Bilinear Attention Net for Diabetic Retinopathy Grading

Abstract:Diabetic retinopathy (DR) is a common retinal disease that leads to blindness. For diagnosis purposes, DR image grading aims to provide automatic DR grade classification, which is not addressed in conventional research methods of binary DR image classification. Small objects in the eye images, like lesions and microaneurysms, are essential to DR grading in medical imaging, but they could easily be influenced by other objects. To address these challenges, we propose a new deep learning architecture, called BiRA-Net, which combines the attention model for feature extraction and bilinear model for fine-grained classification. Furthermore, in considering the distance between different grades of different DR categories, we propose a new loss function, called grading loss, which leads to improved training convergence of the proposed approach. Experimental results are provided to demonstrate the superior performance of the proposed approach.

* Accepted at ICIP 2019

Via

Access Paper or Ask Questions