Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiangneng Li

On Inferring User Socioeconomic Status with Mobility Records

Nov 15, 2022

Zheng Wang, Mingrui Liu, Cheng Long, Qianru Zhang, Jiangneng Li, Chunyan Miao

Figure 1 for On Inferring User Socioeconomic Status with Mobility Records

Figure 2 for On Inferring User Socioeconomic Status with Mobility Records

Figure 3 for On Inferring User Socioeconomic Status with Mobility Records

Figure 4 for On Inferring User Socioeconomic Status with Mobility Records

Abstract:When users move in a physical space (e.g., an urban space), they would have some records called mobility records (e.g., trajectories) generated by devices such as mobile phones and GPS devices. Naturally, mobility records capture essential information of how users work, live and entertain in their daily lives, and therefore, they have been used in a wide range of tasks such as user profile inference, mobility prediction and traffic management. In this paper, we expand this line of research by investigating the problem of inferring user socioeconomic statuses (such as prices of users' living houses as a proxy of users' socioeconomic statuses) based on their mobility records, which can potentially be used in real-life applications such as the car loan business. For this task, we propose a socioeconomic-aware deep model called DeepSEI. The DeepSEI model incorporates two networks called deep network and recurrent network, which extract the features of the mobility records from three aspects, namely spatiality, temporality and activity, one at a coarse level and the other at a detailed level. We conduct extensive experiments on real mobility records data, POI data and house prices data. The results verify that the DeepSEI model achieves superior performance than existing studies. All datasets used in this paper will be made publicly available.

* IEEE International Conference on Big Data (IEEE BigData 2022)

Via

Access Paper or Ask Questions

Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Sep 06, 2022

Jiaxing Xu, Jianbin Cui, Jiangneng Li, Wenge Rong, Noboru Matsuda

Figure 1 for Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Figure 2 for Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Figure 3 for Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Figure 4 for Entity Aware Syntax Tree Based Data Augmentation for Natural Language Understanding

Abstract:Understanding the intention of the users and recognizing the semantic entities from their sentences, aka natural language understanding (NLU), is the upstream task of many natural language processing tasks. One of the main challenges is to collect a sufficient amount of annotated data to train a model. Existing research about text augmentation does not abundantly consider entity and thus performs badly for NLU tasks. To solve this problem, we propose a novel NLP data augmentation technique, Entity Aware Data Augmentation (EADA), which applies a tree structure, Entity Aware Syntax Tree (EAST), to represent sentences combined with attention on the entity. Our EADA technique automatically constructs an EAST from a small amount of annotated data, and then generates a large number of training instances for intent detection and slot filling. Experimental results on four datasets showed that the proposed technique significantly outperforms the existing data augmentation methods in terms of both accuracy and generalization ability.

Via

Access Paper or Ask Questions

Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Sep 15, 2021

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler(+4 more)

Figure 1 for Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Figure 2 for Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Figure 3 for Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Figure 4 for Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation

Abstract:Cardinality estimation (CardEst) plays a significant role in generating high-quality query plans for a query optimizer in DBMS. In the last decade, an increasing number of advanced CardEst methods (especially ML-based) have been proposed with outstanding estimation accuracy and inference latency. However, there exists no study that systematically evaluates the quality of these methods and answer the fundamental problem: to what extent can these methods improve the performance of query optimizer in real-world settings, which is the ultimate goal of a CardEst method. In this paper, we comprehensively and systematically compare the effectiveness of CardEst methods in a real DBMS. We establish a new benchmark for CardEst, which contains a new complex real-world dataset STATS and a diverse query workload STATS-CEB. We integrate multiple most representative CardEst methods into an open-source database system PostgreSQL, and comprehensively evaluate their true effectiveness in improving query plan quality, and other important aspects affecting their applicability, ranging from inference latency, model size, and training time, to update efficiency and accuracy. We obtain a number of key findings for the CardEst methods, under different data and query settings. Furthermore, we find that the widely used estimation accuracy metric(Q-Error) cannot distinguish the importance of different sub-plan queries during query optimization and thus cannot truly reflect the query plan quality generated by CardEst methods. Therefore, we propose a new metric P-Error to evaluate the performance of CardEst methods, which overcomes the limitation of Q-Error and is able to reflect the overall end-to-end performance of CardEst methods. We have made all of the benchmark data and evaluation code publicly available at https://github.com/Nathaniel-Han/End-to-End-CardEst-Benchmark.

Via

Access Paper or Ask Questions

FSPN: A New Class of Probabilistic Graphical Model

Nov 20, 2020

Ziniu Wu, Rong Zhu, Andreas Pfadler, Yuxing Han, Jiangneng Li, Zhengping Qian, Kai Zeng, Jingren Zhou

Figure 1 for FSPN: A New Class of Probabilistic Graphical Model

Figure 2 for FSPN: A New Class of Probabilistic Graphical Model

Figure 3 for FSPN: A New Class of Probabilistic Graphical Model

Figure 4 for FSPN: A New Class of Probabilistic Graphical Model

Abstract:We introduce factorize sum split product networks (FSPNs), a new class of probabilistic graphical models (PGMs). FSPNs are designed to overcome the drawbacks of existing PGMs in terms of estimation accuracy and inference efficiency. Specifically, Bayesian networks (BNs) have low inference speed and performance of tree structured sum product networks(SPNs) significantly degrades in presence of highly correlated variables. FSPNs absorb their advantages by adaptively modeling the joint distribution of variables according to their dependence degree, so that one can simultaneously attain the two desirable goals: high estimation accuracy and fast inference speed. We present efficient probability inference and structure learning algorithms for FSPNs, along with a theoretical analysis and extensive evaluation evidence. Our experimental results on synthetic and benchmark datasets indicate the superiority of FSPN over other PGMs.

* 16 pages

Via

Access Paper or Ask Questions