Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Min-Te Sun

From ChebNet to ChebGibbsNet

Dec 02, 2024

Jie Zhang, Min-Te Sun

Figure 1 for From ChebNet to ChebGibbsNet

Figure 2 for From ChebNet to ChebGibbsNet

Figure 3 for From ChebNet to ChebGibbsNet

Figure 4 for From ChebNet to ChebGibbsNet

Abstract:Recent advancements in Spectral Graph Convolutional Networks (SpecGCNs) have led to state-of-the-art performance in various graph representation learning tasks. To exploit the potential of SpecGCNs, we analyze corresponding graph filters via polynomial interpolation, the cornerstone of graph signal processing. Different polynomial bases, such as Bernstein, Chebyshev, and monomial basis, have various convergence rates that will affect the error in polynomial interpolation. Although adopting Chebyshev basis for interpolation can minimize maximum error, the performance of ChebNet is still weaker than GPR-GNN and BernNet. \textbf{We point out it is caused by the Gibbs phenomenon, which occurs when the graph frequency response function approximates the target function.} It reduces the approximation ability of a truncated polynomial interpolation. In order to mitigate the Gibbs phenomenon, we propose to add the Gibbs damping factor with each term of Chebyshev polynomials on ChebNet. As a result, our lightweight approach leads to a significant performance boost. Afterwards, we reorganize ChebNet via decoupling feature propagation and transformation. We name this variant as \textbf{ChebGibbsNet}. Our experiments indicate that ChebGibbsNet is superior to other advanced SpecGCNs, such as GPR-GNN and BernNet, in both homogeneous graphs and heterogeneous graphs.

* 12 pages, 2 figures, and 7 tables

Via

Access Paper or Ask Questions

VideoBadminton: A Video Dataset for Badminton Action Recognition

Mar 19, 2024

Qi Li, Tzu-Chen Chiu, Hsiang-Wei Huang, Min-Te Sun, Wei-Shinn Ku

Abstract:In the dynamic and evolving field of computer vision, action recognition has become a key focus, especially with the advent of sophisticated methodologies like Convolutional Neural Networks (CNNs), Convolutional 3D, Transformer, and spatial-temporal feature fusion. These technologies have shown promising results on well-established benchmarks but face unique challenges in real-world applications, particularly in sports analysis, where the precise decomposition of activities and the distinction of subtly different actions are crucial. Existing datasets like UCF101, HMDB51, and Kinetics have offered a diverse range of video data for various scenarios. However, there's an increasing need for fine-grained video datasets that capture detailed categorizations and nuances within broader action categories. In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage. Through an exhaustive evaluation of leading methodologies on this dataset, this study aims to advance the field of action recognition, particularly in badminton sports. The introduction of VideoBadminton could not only serve for badminton action recognition but also provide a dataset for recognizing fine-grained actions. The insights gained from these evaluations are expected to catalyze further research in action comprehension, especially within sports contexts.

Via

Access Paper or Ask Questions

QNet: A Quantum-native Sequence Encoder Architecture

Oct 31, 2022

Wei Day, Hao-Sheng Chen, Min-Te Sun

Abstract:This work investigates how current quantum computers can improve the performance of natural language processing tasks. To achieve this goal, we proposed QNet, a novel sequence encoder model entirely inferences on the quantum computer using a minimum number of qubits. QNet is inspired by Transformer, the state-of-the-art neural network model based on the attention mechanism to relate the tokens. While the attention mechanism requires time complexity of $O(n^2 \cdot d)$ to perform matrix multiplication operations, QNet has merely $O(n+d)$ quantum circuit depth, where $n$ and $d$ represent the length of the sequence and the embedding size, respectively. To employ QNet on the NISQ devices, ResQNet, a quantum-classical hybrid model composed of several QNet blocks linked by residual connections, is introduced. We evaluate ResQNet on various natural language processing tasks, including text classification, rating score prediction, and named entity recognition. ResQNet exhibits a 6% to 818% performance gain on all these tasks over classical state-of-the-art models using the exact embedding dimensions. In summary, this work demonstrates the advantage of quantum computing in natural language processing tasks.

Via

Access Paper or Ask Questions

Multi-modal Transformer Path Prediction for Autonomous Vehicle

Aug 15, 2022

Chia Hong Tseng, Jie Zhang, Min-Te Sun, Kazuya Sakai, Wei-Shinn Ku

Figure 1 for Multi-modal Transformer Path Prediction for Autonomous Vehicle

Figure 2 for Multi-modal Transformer Path Prediction for Autonomous Vehicle

Figure 3 for Multi-modal Transformer Path Prediction for Autonomous Vehicle

Figure 4 for Multi-modal Transformer Path Prediction for Autonomous Vehicle

Abstract:Reasoning about vehicle path prediction is an essential and challenging problem for the safe operation of autonomous driving systems. There exist many research works for path prediction. However, most of them do not use lane information and are not based on the Transformer architecture. By utilizing different types of data collected from sensors equipped on the self-driving vehicles, we propose a path prediction system named Multi-modal Transformer Path Prediction (MTPP) that aims to predict long-term future trajectory of target agents. To achieve more accurate path prediction, the Transformer architecture is adopted in our model. To better utilize the lane information, the lanes which are in opposite direction to target agent are not likely to be taken by the target agent and are consequently filtered out. In addition, consecutive lane chunks are combined to ensure the lane input to be long enough for path prediction. An extensive evaluation is conducted to show the efficacy of the proposed system using nuScene, a real-world trajectory forecasting dataset.

* 9 pages, 12 figures, and 5 tables

Via

Access Paper or Ask Questions

RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Apr 02, 2022

Bo Hui, Wenlu Wang, Jiao Yu, Zhitao Gong, Wei-Shinn Ku, Min-Te Sun, Hua Lu

Figure 1 for RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Figure 2 for RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Figure 3 for RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Figure 4 for RFID-Based Indoor Spatial Query Evaluation with Bayesian Filtering Techniques

Abstract:People spend a significant amount of time in indoor spaces (e.g., office buildings, subway systems, etc.) in their daily lives. Therefore, it is important to develop efficient indoor spatial query algorithms for supporting various location-based applications. However, indoor spaces differ from outdoor spaces because users have to follow the indoor floor plan for their movements. In addition, positioning in indoor environments is mainly based on sensing devices (e.g., RFID readers) rather than GPS devices. Consequently, we cannot apply existing spatial query evaluation techniques devised for outdoor environments for this new challenge. Because Bayesian filtering techniques can be employed to estimate the state of a system that changes over time using a sequence of noisy measurements made on the system, in this research, we propose the Bayesian filtering-based location inference methods as the basis for evaluating indoor spatial queries with noisy RFID raw data. Furthermore, two novel models, indoor walking graph model and anchor point indexing model, are created for tracking object locations in indoor environments. Based on the inference method and tracking models, we develop innovative indoor range and k nearest neighbor (kNN) query algorithms. We validate our solution through use of both synthetic data and real-world data. Our experimental results show that the proposed algorithms can evaluate indoor spatial queries effectively and efficiently. We open-source the code, data, and floor plan at https://github.com/DataScienceLab18/IndoorToolKit.

Via

Access Paper or Ask Questions

sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Oct 14, 2021

Jie Zhang, Bo Hui, Po-Wei Harn, Min-Te Sun, Wei-Shinn Ku

Figure 1 for sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Figure 2 for sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Figure 3 for sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Figure 4 for sMGC: A Complex-Valued Graph Convolutional Network via Magnetic Laplacian for Directed Graphs

Abstract:Recent advancements in Graph Neural Networks have led to state-of-the-art performance on representation learning of graphs for node classification. However, the majority of existing works process directed graphs by symmetrization, which may cause loss of directional information. In this paper, we propose the magnetic Laplacian that preserves edge directionality by encoding it into complex phase as a deformation of the combinatorial Laplacian. In addition, we design an Auto-Regressive Moving-Average (ARMA) filter that is capable of learning global features from graphs. To reduce time complexity, Taylor expansion is applied to approximate the filter. We derive complex-valued operations in graph neural network and devise a simplified Magnetic Graph Convolution network, namely sMGC. Our experiment results demonstrate that sMGC is a fast, powerful, and widely applicable GNN.

* 9 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions