Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hye Won Chung

Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery

Jan 06, 2025

Joonhyuk Yang, Hye Won Chung

Figure 1 for Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery

Figure 2 for Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery

Figure 3 for Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery

Figure 4 for Exact Matching in Correlated Networks with Node Attributes for Improved Community Recovery

Abstract:We study community detection in multiple networks whose nodes and edges are jointly correlated. This setting arises naturally in applications such as social platforms, where a shared set of users may exhibit both correlated friendship patterns and correlated attributes across different platforms. Extending the classical Stochastic Block Model (SBM) and its contextual counterpart (CSBM), we introduce the correlated CSBM, which incorporates structural and attribute correlations across graphs. To build intuition, we first analyze correlated Gaussian Mixture Models, wherein only correlated node attributes are available without edges, and identify the conditions under which an estimator minimizing the distance between attributes achieves exact matching of nodes across the two databases. For correlated CSBMs, we develop a two-step procedure that first applies $k$-core matching to most nodes using edge information, then refines the matching for the remaining unmatched nodes by leveraging their attributes with a distance-based estimator. We identify the conditions under which the algorithm recovers the exact node correspondence, enabling us to merge the correlated edges and average the correlated attributes for enhanced community detection. Crucially, by aligning and combining graphs, we identify regimes in which community detection is impossible in a single graph but becomes feasible when side information from correlated graphs is incorporated. Our results illustrate how the interplay between graph matching and community recovery can boost performance, broadening the scope of multi-graph, attribute-based community detection.

* 30 pages, 3 figures

Via

Access Paper or Ask Questions

Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Nov 20, 2024

Minguk Jang, Hye Won Chung

Figure 1 for Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Figure 2 for Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Figure 3 for Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Figure 4 for Label Distribution Shift-Aware Prediction Refinement for Test-Time Adaptation

Abstract:Test-time adaptation (TTA) is an effective approach to mitigate performance degradation of trained models when encountering input distribution shifts at test time. However, existing TTA methods often suffer significant performance drops when facing additional class distribution shifts. We first analyze TTA methods under label distribution shifts and identify the presence of class-wise confusion patterns commonly observed across different covariate shifts. Based on this observation, we introduce label Distribution shift-Aware prediction Refinement for Test-time adaptation (DART), a novel TTA method that refines the predictions by focusing on class-wise confusion patterns. DART trains a prediction refinement module during an intermediate time by exposing it to several batches with diverse class distributions using the training dataset. This module is then used during test time to detect and correct class distribution shifts, significantly improving pseudo-label accuracy for test data. Our method exhibits 5-18% gains in accuracy under label distribution shifts on CIFAR-10C, without any performance degradation when there is no label distribution shift. Extensive experiments on CIFAR, PACS, OfficeHome, and ImageNet benchmarks demonstrate DART's ability to correct inaccurate predictions caused by test-time distribution shifts. This improvement leads to enhanced performance in existing TTA methods, making DART a valuable plug-in tool.

Via

Access Paper or Ask Questions

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Aug 20, 2024

Dong Geun Shin, Hye Won Chung

Figure 1 for Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Figure 2 for Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Figure 3 for Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Figure 4 for Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Abstract:Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.

* 30 pages, 8 figures, 17 tables

Via

Access Paper or Ask Questions

BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Jun 05, 2024

Hoyong Choi, Nohyun Ki, Hye Won Chung

Figure 1 for BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Figure 2 for BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Figure 3 for BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Figure 4 for BWS: Best Window Selection Based on Sample Scores for Data Pruning across Broad Ranges

Abstract:Data subset selection aims to find a smaller yet informative subset of a large dataset that can approximate the full-dataset training, addressing challenges associated with training neural networks on large-scale datasets. However, existing methods tend to specialize in either high or low selection ratio regimes, lacking a universal approach that consistently achieves competitive performance across a broad range of selection ratios. We introduce a universal and efficient data subset selection method, Best Window Selection (BWS), by proposing a method to choose the best window subset from samples ordered based on their difficulty scores. This approach offers flexibility by allowing the choice of window intervals that span from easy to difficult samples. Furthermore, we provide an efficient mechanism for selecting the best window subset by evaluating its quality using kernel ridge regression. Our experimental results demonstrate the superior performance of BWS compared to other baselines across a broad range of selection ratios over datasets, including CIFAR-10/100 and ImageNet, and the scenarios involving training from random initialization or fine-tuning of pre-trained models.

* ICML 2024

Via

Access Paper or Ask Questions

Understanding Self-Distillation and Partial Label Learning in Multi-Class Classification with Label Noise

Feb 16, 2024

Hyeonsu Jeong, Hye Won Chung

Abstract:Self-distillation (SD) is the process of training a student model using the outputs of a teacher model, with both models sharing the same architecture. Our study theoretically examines SD in multi-class classification with cross-entropy loss, exploring both multi-round SD and SD with refined teacher outputs, inspired by partial label learning (PLL). By deriving a closed-form solution for the student model's outputs, we discover that SD essentially functions as label averaging among instances with high feature correlations. Initially beneficial, this averaging helps the model focus on feature clusters correlated with a given instance for predicting the label. However, it leads to diminishing performance with increasing distillation rounds. Additionally, we demonstrate SD's effectiveness in label noise scenarios and identify the label corruption condition and minimum number of distillation rounds needed to achieve 100% classification accuracy. Our study also reveals that one-step distillation with refined teacher outputs surpasses the efficacy of multi-step SD using the teacher's direct output in high noise rate regimes.

Via

Access Paper or Ask Questions

Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Jun 02, 2023

Joonhyuk Yang, Dongpil Shin, Hye Won Chung

Figure 1 for Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Figure 2 for Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Figure 3 for Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Figure 4 for Efficient Algorithms for Exact Graph Matching on Correlated Stochastic Block Models with Constant Correlation

Abstract:We consider the problem of graph matching, or learning vertex correspondence, between two correlated stochastic block models (SBMs). The graph matching problem arises in various fields, including computer vision, natural language processing and bioinformatics, and in particular, matching graphs with inherent community structure has significance related to de-anonymization of correlated social networks. Compared to the correlated Erdos-Renyi (ER) model, where various efficient algorithms have been developed, among which a few algorithms have been proven to achieve the exact matching with constant edge correlation, no low-order polynomial algorithm has been known to achieve exact matching for the correlated SBMs with constant correlation. In this work, we propose an efficient algorithm for matching graphs with community structure, based on the comparison between partition trees rooted from each vertex, by extending the idea of Mao et al. (2021) to graphs with communities. The partition tree divides the large neighborhoods of each vertex into disjoint subsets using their edge statistics to different communities. Our algorithm is the first low-order polynomial-time algorithm achieving exact matching between two correlated SBMs with high probability in dense graphs.

* ICML 2023

Via

Access Paper or Ask Questions

Detection problems in the spiked matrix models

Jan 16, 2023

Ji Hyung Jung, Hye Won Chung, Ji Oon Lee

Abstract:We study the statistical decision process of detecting the low-rank signal from various signal-plus-noise type data matrices, known as the spiked random matrix models. We first show that the principal component analysis can be improved by entrywise pre-transforming the data matrix if the noise is non-Gaussian, generalizing the known results for the spiked random matrix models with rank-1 signals. As an intermediate step, we find out sharp phase transition thresholds for the extreme eigenvalues of spiked random matrices, which generalize the Baik-Ben Arous-P\'{e}ch\'{e} (BBP) transition. We also prove the central limit theorem for the linear spectral statistics for the spiked random matrices and propose a hypothesis test based on it, which does not depend on the distribution of the signal or the noise. When the noise is non-Gaussian noise, the test can be improved with an entrywise transformation to the data matrix with additive noise. We also introduce an algorithm that estimates the rank of the signal when it is not known a priori.

* 80 pages, 6 figures. arXiv admin note: text overlap with arXiv:2104.13517

Via

Access Paper or Ask Questions

Data Valuation Without Training of a Model

Jan 03, 2023

Nohyun Ki, Hoyong Choi, Hye Won Chung

Abstract:Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.

Via

Access Paper or Ask Questions

Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Dec 29, 2022

Hyeonsu Jeong, Hye Won Chung

Figure 1 for Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Figure 2 for Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Figure 3 for Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Figure 4 for Recovering Top-Two Answers and Confusion Probability in Multi-Choice Crowdsourcing

Abstract:Crowdsourcing has emerged as an effective platform to label a large volume of data in a cost- and time-efficient manner. Most previous works have focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourced labeling with the goal of recovering not only the ground truth but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model where there are top-two plausible answers for each task, distinguished from the rest of choices. Task difficulty is quantified by the confusion probability between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer the top-two answers as well as the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real-data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and training neural networks with the soft labels composed of the top-two most plausible classes.

Via

Access Paper or Ask Questions

Rank-1 Matrix Completion with Gradient Descent and Small Random Initialization

Dec 19, 2022

Daesung Kim, Hye Won Chung

Abstract:The nonconvex formulation of matrix completion problem has received significant attention in recent years due to its affordable complexity compared to the convex formulation. Gradient descent (GD) is the simplest yet efficient baseline algorithm for solving nonconvex optimization problems. The success of GD has been witnessed in many different problems in both theory and practice when it is combined with random initialization. However, previous works on matrix completion require either careful initialization or regularizers to prove the convergence of GD. In this work, we study the rank-1 symmetric matrix completion and prove that GD converges to the ground truth when small random initialization is used. We show that in logarithmic amount of iterations, the trajectory enters the region where local convergence occurs. We provide an upper bound on the initialization size that is sufficient to guarantee the convergence and show that a larger initialization can be used as more samples are available. We observe that implicit regularization effect of GD plays a critical role in the analysis, and for the entire trajectory, it prevents each entry from becoming much larger than the others.

Via

Access Paper or Ask Questions