Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yichao Liu

Learning to Discover Generalized Facial Expressions

Sep 30, 2024

Tingzhang Luo, Yichao Liu, Yuanyuan Liu, Andi Zhang, Xin Wang, Chang Tang, Zhe Chen

Figure 1 for Learning to Discover Generalized Facial Expressions

Figure 2 for Learning to Discover Generalized Facial Expressions

Figure 3 for Learning to Discover Generalized Facial Expressions

Figure 4 for Learning to Discover Generalized Facial Expressions

Abstract:We introduce Facial Expression Category Discovery (FECD), a novel task in the domain of open-world facial expression recognition (O-FER). While Generalized Category Discovery (GCD) has been explored in natural image datasets, applying it to facial expressions presents unique challenges. Specifically, we identify two key biases to better understand these challenges: Theoretical Bias-arising from the introduction of new categories in unlabeled training data, and Practical Bias-stemming from the imbalanced and fine-grained nature of facial expression data. To address these challenges, we propose FER-GCD, an adversarial approach that integrates both implicit and explicit debiasing components. In the implicit debiasing process, we devise F-discrepancy, a novel metric used to estimate the upper bound of Theoretical Bias, helping the model minimize this upper bound through adversarial training. The explicit debiasing process further optimizes the feature generator and classifier to reduce Practical Bias. Extensive experiments on GCD-based FER datasets demonstrate that our FER-GCD framework significantly improves accuracy on both old and new categories, achieving an average improvement of 9.8% over the baseline and outperforming state-of-the-art methods.

Via

Access Paper or Ask Questions

Exploring Large Language Models for Human Mobility Prediction under Public Events

Nov 29, 2023

Yuebing Liang, Yichao Liu, Xiaohan Wang, Zhan Zhao

Abstract:Public events, such as concerts and sports games, can be major attractors for large crowds, leading to irregular surges in travel demand. Accurate human mobility prediction for public events is thus crucial for event planning as well as traffic or crowd management. While rich textual descriptions about public events are commonly available from online sources, it is challenging to encode such information in statistical or machine learning models. Existing methods are generally limited in incorporating textual information, handling data sparsity, or providing rationales for their predictions. To address these challenges, we introduce a framework for human mobility prediction under public events (LLM-MPE) based on Large Language Models (LLMs), leveraging their unprecedented ability to process textual data, learn from minimal examples, and generate human-readable explanations. Specifically, LLM-MPE first transforms raw, unstructured event descriptions from online sources into a standardized format, and then segments historical mobility data into regular and event-related components. A prompting strategy is designed to direct LLMs in making and rationalizing demand predictions considering historical mobility and event features. A case study is conducted for Barclays Center in New York City, based on publicly available event information and taxi trip data. Results show that LLM-MPE surpasses traditional models, particularly on event days, with textual data significantly enhancing its accuracy. Furthermore, LLM-MPE offers interpretable insights into its predictions. Despite the great potential of LLMs, we also identify key challenges including misinformation and high costs that remain barriers to their broader adoption in large-scale human mobility analysis.

Via

Access Paper or Ask Questions

Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Dec 10, 2021

Yichao Liu, Zongru Shao, Nico Hoffmann

Figure 1 for Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Figure 2 for Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Figure 3 for Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Figure 4 for Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions

Abstract:A variety of attention mechanisms have been studied to improve the performance of various computer vision tasks. However, the prior methods overlooked the significance of retaining the information on both channel and spatial aspects to enhance the cross-dimension interactions. Therefore, we propose a global attention mechanism that boosts the performance of deep neural networks by reducing information reduction and magnifying the global interactive representations. We introduce 3D-permutation with multilayer-perceptron for channel attention alongside a convolutional spatial attention submodule. The evaluation of the proposed mechanism for the image classification task on CIFAR-100 and ImageNet-1K indicates that our method stably outperforms several recent attention mechanisms with both ResNet and lightweight MobileNet.

* 5 pages, 3 figures, 4 tables

Via

Access Paper or Ask Questions

NAM: Normalization-based Attention Module

Nov 24, 2021

Yichao Liu, Zongru Shao, Yueyang Teng, Nico Hoffmann

Figure 1 for NAM: Normalization-based Attention Module

Figure 2 for NAM: Normalization-based Attention Module

Figure 3 for NAM: Normalization-based Attention Module

Figure 4 for NAM: Normalization-based Attention Module

Abstract:Recognizing less salient features is the key for model compression. However, it has not been investigated in the revolutionary attention mechanisms. In this work, we propose a novel normalization-based attention module (NAM), which suppresses less salient weights. It applies a weight sparsity penalty to the attention modules, thus, making them more computational efficient while retaining similar performance. A comparison with three other attention mechanisms on both Resnet and Mobilenet indicates that our method results in higher accuracy. Code for this paper can be publicly accessed at https://github.com/Christian-lyc/NAM.

* 3 pages, 2 figures, 2 tables, 2 tables in the appendix

Via

Access Paper or Ask Questions