Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sanghoon Lee

Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA

Mar 13, 2025

Zhixuan Li, Hyunse Yoon, Sanghoon Lee, Weisi Lin

Abstract:Amodal segmentation aims to infer the complete shape of occluded objects, even when the occluded region's appearance is unavailable. However, current amodal segmentation methods lack the capability to interact with users through text input and struggle to understand or reason about implicit and complex purposes. While methods like LISA integrate multi-modal large language models (LLMs) with segmentation for reasoning tasks, they are limited to predicting only visible object regions and face challenges in handling complex occlusion scenarios. To address these limitations, we propose a novel task named amodal reasoning segmentation, aiming to predict the complete amodal shape of occluded objects while providing answers with elaborations based on user text input. We develop a generalizable dataset generation pipeline and introduce a new dataset focusing on daily life scenarios, encompassing diverse real-world occlusions. Furthermore, we present AURA (Amodal Understanding and Reasoning Assistant), a novel model with advanced global and spatial-level designs specifically tailored to handle complex occlusions. Extensive experiments validate AURA's effectiveness on the proposed dataset. The code, model, and dataset will be publicly released.

* 11 pages, 5 figures, 5 tables

Via

Access Paper or Ask Questions

BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Jan 31, 2025

Zhixuan Li, Naipeng Chen, Seonghwa Choi, Sanghoon Lee, Weisi Lin

Figure 1 for BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Figure 2 for BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Figure 3 for BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Figure 4 for BEAT: Balanced Frequency Adaptive Tuning for Long-Term Time-Series Forecasting

Abstract:Time-series forecasting is crucial for numerous real-world applications including weather prediction and financial market modeling. While temporal-domain methods remain prevalent, frequency-domain approaches can effectively capture multi-scale periodic patterns, reduce sequence dependencies, and naturally denoise signals. However, existing approaches typically train model components for all frequencies under a unified training objective, often leading to mismatched learning speeds: high-frequency components converge faster and risk overfitting, while low-frequency components underfit due to insufficient training time. To deal with this challenge, we propose BEAT (Balanced frEquency Adaptive Tuning), a novel framework that dynamically monitors the training status for each frequency and adaptively adjusts their gradient updates. By recognizing convergence, overfitting, or underfitting for each frequency, BEAT dynamically reallocates learning priorities, moderating gradients for rapid learners and increasing those for slower ones, alleviating the tension between competing objectives across frequencies and synchronizing the overall learning process. Extensive experiments on seven real-world datasets demonstrate that BEAT consistently outperforms state-of-the-art approaches.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Oct 28, 2024

Kiwoong Yoo, Owen Oertell, Junhyun Lee, Sanghoon Lee, Jaewoo Kang

Figure 1 for TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Figure 2 for TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Figure 3 for TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Figure 4 for TurboHopp: Accelerated Molecule Scaffold Hopping with Consistency Models

Abstract:Navigating the vast chemical space of druggable compounds is a formidable challenge in drug discovery, where generative models are increasingly employed to identify viable candidates. Conditional 3D structure-based drug design (3D-SBDD) models, which take into account complex three-dimensional interactions and molecular geometries, are particularly promising. Scaffold hopping is an efficient strategy that facilitates the identification of similar active compounds by strategically modifying the core structure of molecules, effectively narrowing the wide chemical space and enhancing the discovery of drug-like products. However, the practical application of 3D-SBDD generative models is hampered by their slow processing speeds. To address this bottleneck, we introduce TurboHopp, an accelerated pocket-conditioned 3D scaffold hopping model that merges the strategic effectiveness of traditional scaffold hopping with rapid generation capabilities of consistency models. This synergy not only enhances efficiency but also significantly boosts generation speeds, achieving up to 30 times faster inference speed as well as superior generation quality compared to existing diffusion-based models, establishing TurboHopp as a powerful tool in drug discovery. Supported by faster inference speed, we further optimize our model, using Reinforcement Learning for Consistency Models (RLCM), to output desirable molecules. We demonstrate the broad applicability of TurboHopp across multiple drug discovery scenarios, underscoring its potential in diverse molecular settings.

* 22 pages, 11 figures, 8 tables. Presented at NeurIPS 2024

Via

Access Paper or Ask Questions

Learning-enabled Flexible Job-shop Scheduling for Scalable Smart Manufacturing

Feb 14, 2024

Sihoon Moon, Sanghoon Lee, Kyung-Joon Park

Abstract:In smart manufacturing systems (SMSs), flexible job-shop scheduling with transportation constraints (FJSPT) is essential to optimize solutions for maximizing productivity, considering production flexibility based on automated guided vehicles (AGVs). Recent developments in deep reinforcement learning (DRL)-based methods for FJSPT have encountered a scale generalization challenge. These methods underperform when applied to environment at scales different from their training set, resulting in low-quality solutions. To address this, we introduce a novel graph-based DRL method, named the Heterogeneous Graph Scheduler (HGS). Our method leverages locally extracted relational knowledge among operations, machines, and vehicle nodes for scheduling, with a graph-structured decision-making framework that reduces encoding complexity and enhances scale generalization. Our performance evaluation, conducted with benchmark datasets, reveals that the proposed method outperforms traditional dispatching rules, meta-heuristics, and existing DRL-based approaches in terms of makespan performance, even on large-scale instances that have not been experienced during training.

Via

Access Paper or Ask Questions

MolPLA: A Molecular Pretraining Framework for Learning Cores, R-Groups and their Linker Joints

Jan 30, 2024

Mogan Gim, Jueon Park, Soyon Park, Sanghoon Lee, Seungheun Baek, Junhyun Lee, Ngoc-Quang Nguyen, Jaewoo Kang

Abstract:Molecular core structures and R-groups are essential concepts in drug development. Integration of these concepts with conventional graph pre-training approaches can promote deeper understanding in molecules. We propose MolPLA, a novel pre-training framework that employs masked graph contrastive learning in understanding the underlying decomposable parts inmolecules that implicate their core structure and peripheral R-groups. Furthermore, we formulate an additional framework that grants MolPLA the ability to help chemists find replaceable R-groups in lead optimization scenarios. Experimental results on molecular property prediction show that MolPLA exhibits predictability comparable to current state-of-the-art models. Qualitative analysis implicate that MolPLA is capable of distinguishing core and R-group sub-structures, identifying decomposable regions in molecules and contributing to lead optimization scenarios by rationally suggesting R-group replacements given various query core templates. The code implementation for MolPLA and its pre-trained model checkpoint is available at https://github.com/dmis-lab/MolPLA

Via

Access Paper or Ask Questions

Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Aug 23, 2023

Geon Lee, Sanghoon Lee, Dohyung Kim, Younghoon Shin, Yongsang Yoon, Bumsub Ham

Figure 1 for Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Figure 2 for Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Figure 3 for Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Figure 4 for Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Abstract:We present a novel unsupervised domain adaption method for person re-identification (reID) that generalizes a model trained on a labeled source domain to an unlabeled target domain. We introduce a camera-driven curriculum learning (CaCL) framework that leverages camera labels of person images to transfer knowledge from source to target domains progressively. To this end, we divide target domain dataset into multiple subsets based on the camera labels, and initially train our model with a single subset (i.e., images captured by a single camera). We then gradually exploit more subsets for training, according to a curriculum sequence obtained with a camera-driven scheduling rule. The scheduler considers maximum mean discrepancies (MMD) between each subset and the source domain dataset, such that the subset closer to the source domain is exploited earlier within the curriculum. For each curriculum sequence, we generate pseudo labels of person images in a target domain to train a reID model in a supervised way. We have observed that the pseudo labels are highly biased toward cameras, suggesting that person images obtained from the same camera are likely to have the same pseudo labels, even for different IDs. To address the camera bias problem, we also introduce a camera-diversity (CD) loss encouraging person images of the same pseudo label, but captured across various cameras, to involve more for discriminative feature learning, providing person representations robust to inter-camera variations. Experimental results on standard benchmarks, including real-to-real and synthetic-to-real scenarios, demonstrate the effectiveness of our framework.

* Accepted to ICCV 2023

Via

Access Paper or Ask Questions

Extracting Structured Seed-Mediated Gold Nanorod Growth Procedures from Literature with GPT-3

Apr 26, 2023

Nicholas Walker, John Dagdelen, Kevin Cruse, Sanghoon Lee, Samuel Gleason, Alexander Dunn, Gerbrand Ceder, A. Paul Alivisatos, Kristin A. Persson, Anubhav Jain

Abstract:Although gold nanorods have been the subject of much research, the pathways for controlling their shape and thereby their optical properties remain largely heuristically understood. Although it is apparent that the simultaneous presence of and interaction between various reagents during synthesis control these properties, computational and experimental approaches for exploring the synthesis space can be either intractable or too time-consuming in practice. This motivates an alternative approach leveraging the wealth of synthesis information already embedded in the body of scientific literature by developing tools to extract relevant structured data in an automated, high-throughput manner. To that end, we present an approach using the powerful GPT-3 language model to extract structured multi-step seed-mediated growth procedures and outcomes for gold nanorods from unstructured scientific text. GPT-3 prompt completions are fine-tuned to predict synthesis templates in the form of JSON documents from unstructured text input with an overall accuracy of $86\%$. The performance is notable, considering the model is performing simultaneous entity recognition and relation extraction. We present a dataset of 11,644 entities extracted from 1,137 papers, resulting in 268 papers with at least one complete seed-mediated gold nanorod growth procedure and outcome for a total of 332 complete procedures.

Via

Access Paper or Ask Questions

Structured information extraction from complex scientific text with fine-tuned large language models

Dec 10, 2022

Alexander Dunn, John Dagdelen, Nicholas Walker, Sanghoon Lee, Andrew S. Rosen, Gerbrand Ceder, Kristin Persson, Anubhav Jain

Figure 1 for Structured information extraction from complex scientific text with fine-tuned large language models

Figure 2 for Structured information extraction from complex scientific text with fine-tuned large language models

Figure 3 for Structured information extraction from complex scientific text with fine-tuned large language models

Figure 4 for Structured information extraction from complex scientific text with fine-tuned large language models

Abstract:Intelligently extracting and linking complex scientific information from unstructured text is a challenging endeavor particularly for those inexperienced with natural language processing. Here, we present a simple sequence-to-sequence approach to joint named entity recognition and relation extraction for complex hierarchical information in scientific text. The approach leverages a pre-trained large language model (LLM), GPT-3, that is fine-tuned on approximately 500 pairs of prompts (inputs) and completions (outputs). Information is extracted either from single sentences or across sentences in abstracts/passages, and the output can be returned as simple English sentences or a more structured format, such as a list of JSON objects. We demonstrate that LLMs trained in this way are capable of accurately extracting useful records of complex scientific knowledge for three representative tasks in materials chemistry: linking dopants with their host materials, cataloging metal-organic frameworks, and general chemistry/phase/morphology/application information extraction. This approach represents a simple, accessible, and highly-flexible route to obtaining large databases of structured knowledge extracted from unstructured text. An online demo is available at http://www.matscholar.com/info-extraction.

Via

Access Paper or Ask Questions

Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Oct 12, 2022

Donghyeon Baek, Youngmin Oh, Sanghoon Lee, Junghyup Lee, Bumsub Ham

Figure 1 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 2 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 3 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Figure 4 for Decomposed Knowledge Distillation for Class-Incremental Semantic Segmentation

Abstract:Class-incremental semantic segmentation (CISS) labels each pixel of an image with a corresponding object/stuff class continually. To this end, it is crucial to learn novel classes incrementally without forgetting previously learned knowledge. Current CISS methods typically use a knowledge distillation (KD) technique for preserving classifier logits, or freeze a feature extractor, to avoid the forgetting problem. The strong constraints, however, prevent learning discriminative features for novel classes. We introduce a CISS framework that alleviates the forgetting problem and facilitates learning novel classes effectively. We have found that a logit can be decomposed into two terms. They quantify how likely an input belongs to a particular class or not, providing a clue for a reasoning process of a model. The KD technique, in this context, preserves the sum of two terms (i.e., a class logit), suggesting that each could be changed and thus the KD does not imitate the reasoning process. To impose constraints on each term explicitly, we propose a new decomposed knowledge distillation (DKD) technique, improving the rigidity of a model and addressing the forgetting problem more effectively. We also introduce a novel initialization method to train new classifiers for novel classes. In CISS, the number of negative training samples for novel classes is not sufficient to discriminate old classes. To mitigate this, we propose to transfer knowledge of negatives to the classifiers successively using an auxiliary classifier, boosting the performance significantly. Experimental results on standard CISS benchmarks demonstrate the effectiveness of our framework.

* Accepted to NeurIPS 2022

Via

Access Paper or Ask Questions

OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Jul 21, 2022

Sanghoon Lee, Youngmin Oh, Donghyeon Baek, Junghyup Lee, Bumsub Ham

Figure 1 for OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Figure 2 for OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Figure 3 for OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Figure 4 for OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search

Abstract:We address the task of person search, that is, localizing and re-identifying query persons from a set of raw scene images. Recent approaches are typically built upon OIMNet, a pioneer work on person search, that learns joint person representations for performing both detection and person re-identification (reID) tasks. To obtain the representations, they extract features from pedestrian proposals, and then project them on a unit hypersphere with L2 normalization. These methods also incorporate all positive proposals, that sufficiently overlap with the ground truth, equally to learn person representations for reID. We have found that 1) the L2 normalization without considering feature distributions degenerates the discriminative power of person representations, and 2) positive proposals often also depict background clutter and person overlaps, which could encode noisy features to person representations. In this paper, we introduce OIMNet++ that addresses the aforementioned limitations. To this end, we introduce a novel normalization layer, dubbed ProtoNorm, that calibrates features from pedestrian proposals, while considering a long-tail distribution of person IDs, enabling L2 normalized person representations to be discriminative. We also propose a localization-aware feature learning scheme that encourages better-aligned proposals to contribute more in learning discriminative representations. Experimental results and analysis on standard person search benchmarks demonstrate the effectiveness of OIMNet++.

* Accepted to ECCV 2022

Via

Access Paper or Ask Questions