Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanyong Huang

Iterative Feature Space Optimization through Incremental Adaptive Evaluation

Jan 24, 2025

Yanping Wu, Yanyong Huang, Zhengzhang Chen, Zijun Yao, Yanjie Fu, Kunpeng Liu, Xiao Luo, Dongjie Wang

Abstract:Iterative feature space optimization involves systematically evaluating and adjusting the feature space to improve downstream task performance. However, existing works suffer from three key limitations:1) overlooking differences among data samples leads to evaluation bias; 2) tailoring feature spaces to specific machine learning models results in overfitting and poor generalization; 3) requiring the evaluator to be retrained from scratch during each optimization iteration significantly reduces the overall efficiency of the optimization process. To bridge these gaps, we propose a gEneralized Adaptive feature Space Evaluator (EASE) to efficiently produce optimal and generalized feature spaces. This framework consists of two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to decouple the information distribution within the feature space to mitigate evaluation bias. To achieve this, we first identify features most relevant to prediction tasks and samples most challenging for evaluation based on feedback from the subsequent evaluator. This decoupling strategy makes the evaluator consistently target the most challenging aspects of the feature space. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. We propose a weighted-sharing multi-head attention mechanism to encode key characteristics of the feature space into an embedding vector for evaluation. Moreover, the evaluator is updated incrementally, retaining prior evaluation knowledge while incorporating new insights, as consecutive feature spaces during the optimization process share partial information. Extensive experiments on fourteen real-world datasets demonstrate the effectiveness of the proposed framework. Our code and data are publicly available.

* 18 pages

Via

Access Paper or Ask Questions

Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Jan 17, 2025

Dongjie Wang, Yanyong Huang, Wangyang Ying, Haoyue Bai, Nanxu Gong, Xinyuan Wang, Sixun Dong, Tao Zhe, Kunpeng Liu, Meng Xiao(+4 more)

Figure 1 for Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Figure 2 for Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Figure 3 for Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Figure 4 for Towards Data-Centric AI: A Comprehensive Survey of Traditional, Reinforcement, and Generative Approaches for Tabular Data Transformation

Abstract:Tabular data is one of the most widely used formats across industries, driving critical applications in areas such as finance, healthcare, and marketing. In the era of data-centric AI, improving data quality and representation has become essential for enhancing model performance, particularly in applications centered around tabular data. This survey examines the key aspects of tabular data-centric AI, emphasizing feature selection and feature generation as essential techniques for data space refinement. We provide a systematic review of feature selection methods, which identify and retain the most relevant data attributes, and feature generation approaches, which create new features to simplify the capture of complex data patterns. This survey offers a comprehensive overview of current methodologies through an analysis of recent advancements, practical applications, and the strengths and limitations of these techniques. Finally, we outline open challenges and suggest future perspectives to inspire continued innovation in this field.

Via

Access Paper or Ask Questions

CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection

Dec 09, 2024

Yanyong Huang, Yuxin Cai, Dongjie Wang, Xiuwen Yi, Tianrui Li

Figure 1 for CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection

Figure 2 for CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection

Figure 3 for CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection

Figure 4 for CONDEN-FI: Consistency and Diversity Learning-based Multi-View Unsupervised Feature and In-stance Co-Selection

Abstract:The objective of multi-view unsupervised feature and instance co-selection is to simultaneously iden-tify the most representative features and samples from multi-view unlabeled data, which aids in mit-igating the curse of dimensionality and reducing instance size to improve the performance of down-stream tasks. However, existing methods treat feature selection and instance selection as two separate processes, failing to leverage the potential interactions between the feature and instance spaces. Addi-tionally, previous co-selection methods for multi-view data require concatenating different views, which overlooks the consistent information among them. In this paper, we propose a CONsistency and DivErsity learNing-based multi-view unsupervised Feature and Instance co-selection (CONDEN-FI) to address the above-mentioned issues. Specifically, CONDEN-FI reconstructs mul-ti-view data from both the sample and feature spaces to learn representations that are consistent across views and specific to each view, enabling the simultaneous selection of the most important features and instances. Moreover, CONDEN-FI adaptively learns a view-consensus similarity graph to help select both dissimilar and similar samples in the reconstructed data space, leading to a more diverse selection of instances. An efficient algorithm is developed to solve the resultant optimization problem, and the comprehensive experimental results on real-world datasets demonstrate that CONDEN-FI is effective compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

Causally-Aware Unsupervised Feature Selection Learning

Oct 16, 2024

Zongxin Shen, Yanyong Huang, Minbo Ma, Tianrui Li

Figure 1 for Causally-Aware Unsupervised Feature Selection Learning

Figure 2 for Causally-Aware Unsupervised Feature Selection Learning

Figure 3 for Causally-Aware Unsupervised Feature Selection Learning

Figure 4 for Causally-Aware Unsupervised Feature Selection Learning

Abstract:Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

Via

Access Paper or Ask Questions

Adaptive Collaborative Correlation Learning-based Semi-Supervised Multi-Label Feature Selection

Jun 18, 2024

Yanyong Huang, Li Yang, Dongjie Wang, Ke Li, Xiuwen Yi, Fengmao Lv, Tianrui Li

Abstract:Semi-supervised multi-label feature selection has recently been developed to solve the curse of dimensionality problem in high-dimensional multi-label data with certain samples missing labels. Although many efforts have been made, most existing methods use a predefined graph approach to capture the sample similarity or the label correlation. In this manner, the presence of noise and outliers within the original feature space can undermine the reliability of the resulting sample similarity graph. It also fails to precisely depict the label correlation due to the existence of unknown labels. Besides, these methods only consider the discriminative power of selected features, while neglecting their redundancy. In this paper, we propose an Adaptive Collaborative Correlation lEarning-based Semi-Supervised Multi-label Feature Selection (Access-MFS) method to address these issues. Specifically, a generalized regression model equipped with an extended uncorrelated constraint is introduced to select discriminative yet irrelevant features and maintain consistency between predicted and ground-truth labels in labeled data, simultaneously. Then, the instance correlation and label correlation are integrated into the proposed regression model to adaptively learn both the sample similarity graph and the label similarity graph, which mutually enhance feature selection performance. Extensive experimental results demonstrate the superiority of the proposed Access-MFS over other state-of-the-art methods.

Via

Access Paper or Ask Questions

Unified View Imputation and Feature Selection Learning for Incomplete Multi-view Data

Jan 19, 2024

Yanyong Huang, Zongxin Shen, Tianrui Li, Fengmao Lv

Abstract:Although multi-view unsupervised feature selection (MUFS) is an effective technology for reducing dimensionality in machine learning, existing methods cannot directly deal with incomplete multi-view data where some samples are missing in certain views. These methods should first apply predetermined values to impute missing data, then perform feature selection on the complete dataset. Separating imputation and feature selection processes fails to capitalize on the potential synergy where local structural information gleaned from feature selection could guide the imputation, thereby improving the feature selection performance in turn. Additionally, previous methods only focus on leveraging samples' local structure information, while ignoring the intrinsic locality of the feature space. To tackle these problems, a novel MUFS method, called UNified view Imputation and Feature selectIon lEaRning (UNIFIER), is proposed. UNIFIER explores the local structure of multi-view data by adaptively learning similarity-induced graphs from both the sample and feature spaces. Then, UNIFIER dynamically recovers the missing views, guided by the sample and feature similarity graphs during the feature selection procedure. Furthermore, the half-quadratic minimization technique is used to automatically weight different instances, alleviating the impact of outliers and unreliable restored data. Comprehensive experimental results demonstrate that UNIFIER outperforms other state-of-the-art methods.

Via

Access Paper or Ask Questions

Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Sep 26, 2022

Dongjie Wang, Kunpeng Liu, Yanyong Huang, Leilei Sun, Bowen Du, Yanjie Fu

Figure 1 for Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Figure 2 for Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Figure 3 for Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Figure 4 for Automated Urban Planning aware Spatial Hierarchies and Human Instructions

Abstract:Traditional urban planning demands urban experts to spend considerable time and effort producing an optimal urban plan under many architectural constraints. The remarkable imaginative ability of deep generative learning provides hope for renovating urban planning. While automated urban planners have been examined, they are constrained because of the following: 1) neglecting human requirements in urban planning; 2) omitting spatial hierarchies in urban planning, and 3) lacking numerous urban plan data samples. To overcome these limitations, we propose a novel, deep, human-instructed urban planner. In the preliminary work, we formulate it into an encoder-decoder paradigm. The encoder is to learn the information distribution of surrounding contexts, human instructions, and land-use configuration. The decoder is to reconstruct the land-use configuration and the associated urban functional zones. The reconstruction procedure will capture the spatial hierarchies between functional zones and spatial grids. Meanwhile, we introduce a variational Gaussian mechanism to mitigate the data sparsity issue. Even though early work has led to good results, the performance of generation is still unstable because the way spatial hierarchies are captured may lead to unclear optimization directions. In this journal version, we propose a cascading deep generative framework based on generative adversarial networks (GANs) to solve this problem, inspired by the workflow of urban experts. In particular, the purpose of the first GAN is to build urban functional zones based on information from human instructions and surrounding contexts. The second GAN will produce the land-use configuration based on the functional zones that have been constructed. Additionally, we provide a conditioning augmentation module to augment data samples. Finally, we conduct extensive experiments to validate the efficacy of our work.

* KAIS Under Review. arXiv admin note: text overlap with arXiv:2110.07717

Via

Access Paper or Ask Questions

C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection

Aug 20, 2022

Yanyong Huang, Zongxin Shen, Yuxin Cai, Xiuwen Yi, Dongjie Wang, Fengmao Lv, Tianrui Li

$Figure 1 for C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection$

$Figure 2 for C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection$

$Figure 3 for C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection$

$Figure 4 for C$^{2}$IMUFS: Complementary and Consensus Learning-based Incomplete Multi-view Unsupervised Feature Selection$

Abstract:Multi-view unsupervised feature selection (MUFS) has been demonstrated as an effective technique to reduce the dimensionality of multi-view unlabeled data. The existing methods assume that all of views are complete. However, multi-view data are usually incomplete, i.e., a part of instances are presented on some views but not all views. Besides, learning the complete similarity graph, as an important promising technology in existing MUFS methods, cannot achieve due to the missing views. In this paper, we propose a complementary and consensus learning-based incomplete multi-view unsupervised feature selection method (C$^{2}$IMUFS) to address the aforementioned issues. Concretely, C$^{2}$IMUFS integrates feature selection into an extended weighted non-negative matrix factorization model equipped with adaptive learning of view-weights and a sparse $\ell_{2,p}$-norm, which can offer better adaptability and flexibility. By the sparse linear combinations of multiple similarity matrices derived from different views, a complementary learning-guided similarity matrix reconstruction model is presented to obtain the complete similarity graph in each view. Furthermore, C$^{2}$IMUFS learns a consensus clustering indicator matrix across different views and embeds it into a spectral graph term to preserve the local geometric structure. Comprehensive experimental results on real-world datasets demonstrate the effectiveness of C$^{2}$IMUFS compared with state-of-the-art methods.

Via

Access Paper or Ask Questions

Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Apr 05, 2022

Yanyong Huang, Kejun Guo, Xiuwen Yi, Zhong Li, Tianrui Li

Figure 1 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 2 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 3 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Figure 4 for Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data

Abstract:Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.

Via

Access Paper or Ask Questions

ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Dec 02, 2021

Huaishao Luo, Lei Ji, Yanyong Huang, Bin Wang, Shenggong Ji, Tianrui Li

Figure 1 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 2 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 3 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Figure 4 for ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors

Abstract:Fusion technique is a key research topic in multimodal sentiment analysis. The recent attention-based fusion demonstrates advances over simple operation-based fusion. However, these fusion works adopt single-scale, i.e., token-level or utterance-level, unimodal representation. Such single-scale fusion is suboptimal because that different modality should be aligned with different granularities. This paper proposes a fusion model named ScaleVLAD to gather multi-Scale representation from text, video, and audio with shared Vectors of Locally Aggregated Descriptors to improve unaligned multimodal sentiment analysis. These shared vectors can be regarded as shared topics to align different modalities. In addition, we propose a self-supervised shifted clustering loss to keep the fused feature differentiation among samples. The backbones are three Transformer encoders corresponding to three modalities, and the aggregated features generated from the fusion module are feed to a Transformer plus a full connection to finish task predictions. Experiments on three popular sentiment analysis benchmarks, IEMOCAP, MOSI, and MOSEI, demonstrate significant gains over baselines.

Via

Access Paper or Ask Questions