Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaohang Xu

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community

Mar 27, 2025

Yunze Xiao, Tingyu He, Lionel Z. Wang, Yiming Ma, Xingyu Song, Xiaohang Xu, Irene Li, Ka Chung Ng

Abstract:This paper introduces JiraiBench, the first bilingual benchmark for evaluating large language models' effectiveness in detecting self-destructive content across Chinese and Japanese social media communities. Focusing on the transnational "Jirai" (landmine) online subculture that encompasses multiple forms of self-destructive behaviors including drug overdose, eating disorders, and self-harm, we present a comprehensive evaluation framework incorporating both linguistic and cultural dimensions. Our dataset comprises 10,419 Chinese posts and 5,000 Japanese posts with multidimensional annotation along three behavioral categories, achieving substantial inter-annotator agreement. Experimental evaluations across four state-of-the-art models reveal significant performance variations based on instructional language, with Japanese prompts unexpectedly outperforming Chinese prompts when processing Chinese content. This emergent cross-cultural transfer suggests that cultural proximity can sometimes outweigh linguistic similarity in detection tasks. Cross-lingual transfer experiments with fine-tuned models further demonstrate the potential for knowledge transfer between these language systems without explicit target language training. These findings highlight the need for culturally-informed approaches to multilingual content moderation and provide empirical evidence for the importance of cultural context in developing more effective detection systems for vulnerable online communities.

* 20 pages, 1 figures

Via

Access Paper or Ask Questions

Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Oct 31, 2024

Peizhi Tang, Chuang Yang, Tong Xing, Xiaohang Xu, Renhe Jiang, Kaoru Sezaki

Figure 1 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 2 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 3 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 4 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Abstract:Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama-3-8B-Mob, a large language model fine-tuned with instruction tuning, for long-term citywide mobility prediction -- in a Q&A manner. We validate our approach using large-scale human mobility data from four metropolitan areas in Japan, focusing on predicting individual trajectories over the next 15 days. The results demonstrate that Llama-3-8B-Mob excels in modeling long-term human mobility -- surpassing the state-of-the-art on multiple prediction metrics. It also displays strong zero-shot generalization capabilities -- effectively generalizing to other cities even when fine-tuned only on limited samples from a single city. Source codes are available at https://github.com/TANGHULU6/Llama3-8B-Mob.

Via

Access Paper or Ask Questions

Taming the Long Tail in Human Mobility Prediction

Oct 19, 2024

Xiaohang Xu, Renhe Jiang, Chuang Yang, Zipei Fan, Kaoru Sezaki

Figure 1 for Taming the Long Tail in Human Mobility Prediction

Figure 2 for Taming the Long Tail in Human Mobility Prediction

Figure 3 for Taming the Long Tail in Human Mobility Prediction

Figure 4 for Taming the Long Tail in Human Mobility Prediction

Abstract:With the popularity of location-based services, human mobility prediction plays a key role in enhancing personalized navigation, optimizing recommendation systems, and facilitating urban mobility and planning. This involves predicting a user's next POI (point-of-interest) visit using their past visit history. However, the uneven distribution of visitations over time and space, namely the long-tail problem in spatial distribution, makes it difficult for AI models to predict those POIs that are less visited by humans. In light of this issue, we propose the Long-Tail Adjusted Next POI Prediction (LoTNext) framework for mobility prediction, combining a Long-Tailed Graph Adjustment module to reduce the impact of the long-tailed nodes in the user-POI interaction graph and a novel Long-Tailed Loss Adjustment module to adjust loss by logit score and sample weight adjustment strategy. Also, we employ the auxiliary prediction task to enhance generalization and accuracy. Our experiments with two real-world trajectory datasets demonstrate that LoTNext significantly surpasses existing state-of-the-art works. Our code is available at https://github.com/Yukayo/LoTNext.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Oct 18, 2024

Chuang Yang, Renhe Jiang, Xiaohang Xu, Chuan Xiao, Kaoru Sezaki

Figure 1 for SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Figure 2 for SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Figure 3 for SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Figure 4 for SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity

Abstract:Free-space trajectory similarity calculation, e.g., DTW, Hausdorff, and Frechet, often incur quadratic time complexity, thus learning-based methods have been proposed to accelerate the computation. The core idea is to train an encoder to transform trajectories into representation vectors and then compute vector similarity to approximate the ground truth. However, existing methods face dual challenges of effectiveness and efficiency: 1) they all utilize Euclidean distance to compute representation similarity, which leads to the severe curse of dimensionality issue -- reducing the distinguishability among representations and significantly affecting the accuracy of subsequent similarity search tasks; 2) most of them are trained in triplets manner and often necessitate additional information which downgrades the efficiency; 3) previous studies, while emphasizing the scalability in terms of efficiency, overlooked the deterioration of effectiveness when the dataset size grows. To cope with these issues, we propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor and employs tailored representation similarity functions to approximate various ground truth similarity measures. Extensive experiments demonstrate our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.

Via

Access Paper or Ask Questions

A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Mar 07, 2024

Xuenan Xu, Xiaohang Xu, Zeyu Xie, Pingyue Zhang, Mengyue Wu, Kai Yu

Figure 1 for A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Figure 2 for A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Figure 3 for A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Figure 4 for A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds

Abstract:Recently, there has been an increasing focus on audio-text cross-modal learning. However, most of the existing audio-text datasets contain only simple descriptions of sound events. Compared with classification labels, the advantages of such descriptions are significantly limited. In this paper, we first analyze the detailed information that human descriptions of audio may contain beyond sound event labels. Based on the analysis, we propose an automatic pipeline for curating audio-text pairs with rich details. Leveraging the property that sounds can be mixed and concatenated in the time domain, we control details in four aspects: temporal relationship, loudness, speaker identity, and occurrence number, in simulating audio mixtures. Corresponding details are transformed into captions by large language models. Audio-text pairs with rich details in text descriptions are thereby obtained. We validate the effectiveness of our pipeline with a small amount of simulated data, demonstrating that the simulated data enables models to learn detailed audio captioning.

Via

Access Paper or Ask Questions

Revisiting Mobility Modeling with Graph: A Graph Transformer Model for Next Point-of-Interest Recommendation

Oct 02, 2023

Xiaohang Xu, Toyotaro Suzumura, Jiawei Yong, Masatoshi Hanai, Chuang Yang, Hiroki Kanezashi, Renhe Jiang, Shintaro Fukushima

Abstract:Next Point-of-Interest (POI) recommendation plays a crucial role in urban mobility applications. Recently, POI recommendation models based on Graph Neural Networks (GNN) have been extensively studied and achieved, however, the effective incorporation of both spatial and temporal information into such GNN-based models remains challenging. Extracting distinct fine-grained features unique to each piece of information is difficult since temporal information often includes spatial information, as users tend to visit nearby POIs. To address the challenge, we propose \textbf{\underline{Mob}}ility \textbf{\underline{G}}raph \textbf{\underline{T}}ransformer (MobGT) that enables us to fully leverage graphs to capture both the spatial and temporal features in users' mobility patterns. MobGT combines individual spatial and temporal graph encoders to capture unique features and global user-location relations. Additionally, it incorporates a mobility encoder based on Graph Transformer to extract higher-order information between POIs. To address the long-tailed problem in spatial-temporal data, MobGT introduces a novel loss function, Tail Loss. Experimental results demonstrate that MobGT outperforms state-of-the-art models on various datasets and metrics, achieving 24\% improvement on average. Our codes are available at \url{https://github.com/Yukayo/MobGT}.

* Accepted as a full paper of SIGSPATIAL 2023

Via

Access Paper or Ask Questions

Privacy-Preserving Federated Depression Detection from Multi-Source Mobile Health Data

Mar 07, 2021

Xiaohang Xu, Hao Peng, Lichao Sun, Md Zakirul Alam Bhuiyan, Lianzhong Liu, Lifang He

Figure 1 for Privacy-Preserving Federated Depression Detection from Multi-Source Mobile Health Data

Figure 2 for Privacy-Preserving Federated Depression Detection from Multi-Source Mobile Health Data

Figure 3 for Privacy-Preserving Federated Depression Detection from Multi-Source Mobile Health Data

Figure 4 for Privacy-Preserving Federated Depression Detection from Multi-Source Mobile Health Data

Abstract:Depression is one of the most common mental illness problems, and the symptoms shown by patients are not consistent, making it difficult to diagnose in the process of clinical practice and pathological research.Although researchers hope that artificial intelligence can contribute to the diagnosis and treatment of depression, the traditional centralized machine learning needs to aggregate patient data, and the data privacy of patients with mental illness needs to be strictly confidential, which hinders machine learning algorithms clinical application.To solve the problem of privacy of the medical history of patients with depression, we implement federated learning to analyze and diagnose depression. First, we propose a general multi-view federated learning framework using multi-source data,which can extend any traditional machine learning model to support federated learning across different institutions or parties.Secondly, we adopt late fusion methods to solve the problem of inconsistent time series of multi-view data.Finally, we compare the federated framework with other cooperative learning frameworks in performance and discuss the related results.

Via

Access Paper or Ask Questions