Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jaesung Lee

Exploiting Fine-Grained Skip Behaviors for Micro-Video Recommendation

Apr 04, 2025

Sanghyuck Lee, Sangkeun Park, Jaesung Lee

Abstract:The growing trend of sharing short videos on social media platforms, where users capture and share moments from their daily lives, has led to an increase in research efforts focused on micro-video recommendations. However, conventional methods oversimplify the modeling of skip behavior, categorizing interactions solely as positive or negative based on whether skipping occurs. This study was motivated by the importance of the first few seconds of micro-videos, leading to a refinement of signals into three distinct categories: highly positive, less positive, and negative. Specifically, we classify skip interactions occurring within a short time as negatives, while those occurring after a delay are categorized as less positive. The proposed dual-level graph and hierarchical ranking loss are designed to effectively learn these fine-grained interactions. Our experiments demonstrated that the proposed method outperformed three conventional methods across eight evaluation measures on two public datasets.

* 9 pages, 5 figures. Published in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2025

Via

Access Paper or Ask Questions

BitAbuse: A Dataset of Visually Perturbed Texts for Defending Phishing Attacks

Feb 06, 2025

Hanyong Lee, Chaelyn Lee, Yongjae Lee, Jaesung Lee

Abstract:Phishing often targets victims through visually perturbed texts to bypass security systems. The noise contained in these texts functions as an adversarial attack, designed to deceive language models and hinder their ability to accurately interpret the content. However, since it is difficult to obtain sufficient phishing cases, previous studies have used synthetic datasets that do not contain real-world cases. In this study, we propose the BitAbuse dataset, which includes real-world phishing cases, to address the limitations of previous research. Our dataset comprises a total of 325,580 visually perturbed texts. The dataset inputs are drawn from the raw corpus, consisting of visually perturbed sentences and sentences generated through an artificial perturbation process. Each input sentence is labeled with its corresponding ground truth, representing the restored, non-perturbed version. Language models trained on our proposed dataset demonstrated significantly better performance compared to previous methods, achieving an accuracy of approximately 96%. Our analysis revealed a significant gap between real-world and synthetic examples, underscoring the value of our dataset for building reliable pre-trained models for restoration tasks. We release the BitAbuse dataset, which includes real-world phishing cases annotated with visual perturbations, to support future research in adversarial attack defense.

* 18 pages, To appear in the Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics 2025

Via

Access Paper or Ask Questions

Cost-constrained multi-label group feature selection using shadow features

Aug 03, 2024

Tomasz Klonecki, Paweł Teisseyre, Jaesung Lee

Abstract:We consider the problem of feature selection in multi-label classification, considering the costs assigned to groups of features. In this task, the goal is to select a subset of features that will be useful for predicting the label vector, but at the same time, the cost associated with the selected features will not exceed the assumed budget. Solving the problem is of great importance in medicine, where we may be interested in predicting various diseases based on groups of features. The groups may be associated with parameters obtained from a certain diagnostic test, such as a blood test. Because diagnostic test costs can be very high, considering cost information when selecting relevant features becomes crucial to reducing the cost of making predictions. We focus on the feature selection method based on information theory. The proposed method consists of two steps. First, we select features sequentially while maximizing conditional mutual information until the budget is exhausted. In the second step, we select additional cost-free features, i.e., those coming from groups that have already been used in previous steps. Limiting the number of added features is possible using the stop rule based on the concept of so-called shadow features, which are randomized counterparts of the original ones. In contrast to existing approaches based on penalized criteria, in our method, we avoid the need for computationally demanding optimization of the penalty parameter. Experiments conducted on the MIMIC medical database show the effectiveness of the method, especially when the assumed budget is limited.

Via

Access Paper or Ask Questions

K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification

Feb 11, 2021

Taewoo Kim, Chaeyeon Chung, Sunghyun Park, Gyojung Gu, Keonmin Nam, Wonzo Choe, Jaesung Lee, Jaegul Choo

Figure 1 for K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification

Figure 2 for K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification

Figure 3 for K-Hairstyle: A Large-scale Korean hairstyle dataset for virtual hair editing and hairstyle classification

Abstract:The hair and beauty industry is one of the fastest growing industries. This led to the development of various applications, such as virtual hair dyeing or hairstyle translations, to satisfy the need of the customers. Although there are several public hair datasets available for these applications, they consist of limited number of images with low resolution, which restrict their performance on high-quality hair editing. Therefore, we introduce a novel large-scale Korean hairstyle dataset, K-hairstyle, 256,679 with high-resolution images. In addition, K-hairstyle contains various hair attributes annotated by Korean expert hair stylists and hair segmentation masks. We validate the effectiveness of our dataset by leveraging several applications, such as hairstyle translation, and hair classification and hair retrieval. Furthermore, we will release K-hairstyle soon.

* hair dataset, classification, segmentation, hair dyeing, hairstyle translation

Via

Access Paper or Ask Questions

Automated segmentation of the pulmonary arteries in low-dose CT by vessel tracking

Jun 27, 2011

Jeremiah Wala, Sergei Fotin, Jaesung Lee, Artit Jirapatnakul, Alberto Biancardi, Anthony Reeves

Figure 1 for Automated segmentation of the pulmonary arteries in low-dose CT by vessel tracking

Figure 2 for Automated segmentation of the pulmonary arteries in low-dose CT by vessel tracking

Figure 3 for Automated segmentation of the pulmonary arteries in low-dose CT by vessel tracking

Figure 4 for Automated segmentation of the pulmonary arteries in low-dose CT by vessel tracking

Abstract:We present a fully automated method for top-down segmentation of the pulmonary arterial tree in low-dose thoracic CT images. The main basal pulmonary arteries are identified near the lung hilum by searching for candidate vessels adjacent to known airways, identified by our previously reported airway segmentation method. Model cylinders are iteratively fit to the vessels to track them into the lungs. Vessel bifurcations are detected by measuring the rate of change of vessel radii, and child vessels are segmented by initiating new trackers at bifurcation points. Validation is accomplished using our novel sparse surface (SS) evaluation metric. The SS metric was designed to quantify the magnitude of the segmentation error per vessel while significantly decreasing the manual marking burden for the human user. A total of 210 arteries and 205 veins were manually marked across seven test cases. 134/210 arteries were correctly segmented, with a specificity for arteries of 90%, and average segmentation error of 0.15 mm. This fully-automated segmentation is a promising method for improving lung nodule detection in low-dose CT screening scans, by separating vessels from surrounding iso-intensity objects.

Via

Access Paper or Ask Questions