Abstract:Fine-tuning large language models (LLMs) to align with user preferences is challenging due to the high cost of quality human annotations in Reinforcement Learning from Human Feedback (RLHF) and the generalizability limitations of AI Feedback. To address these challenges, we propose RLTHF, a human-AI hybrid framework that combines LLM-based initial alignment with selective human annotations to achieve full-human annotation alignment with minimal effort. RLTHF identifies hard-to-annotate samples mislabeled by LLMs using a reward model's reward distribution and iteratively enhances alignment by integrating strategic human corrections while leveraging LLM's correctly labeled samples. Evaluations on HH-RLHF and TL;DR datasets show that RLTHF reaches full-human annotation-level alignment with only 6-7% of the human annotation effort. Furthermore, models trained on RLTHF's curated datasets for downstream tasks outperform those trained on fully human-annotated datasets, underscoring the effectiveness of RLTHF's strategic data curation.
Abstract:Large-scale study of glaciers improves our understanding of global glacier change and is imperative for monitoring the ecological environment, preventing disasters, and studying the effects of global climate change. Glaciers in the Hindu Kush Himalaya (HKH) are particularly interesting as the HKH is one of the world's most sensitive regions for climate change. In this work, we: (1) propose a modified version of the U-Net for large-scale, spatially non-overlapping, clean glacial ice, and debris-covered glacial ice segmentation; (2) introduce a novel self-learning boundary-aware loss to improve debris-covered glacial ice segmentation performance; and (3) propose a feature-wise saliency score to understand the contribution of each feature in the multispectral Landsat 7 imagery for glacier mapping. Our results show that the debris-covered glacial ice segmentation model trained using self-learning boundary-aware loss outperformed the model trained using dice loss. Furthermore, we conclude that red, shortwave infrared, and near-infrared bands have the highest contribution toward debris-covered glacial ice segmentation from Landsat 7 images.
Abstract:Glacier mapping is key to ecological monitoring in the hkh region. Climate change poses a risk to individuals whose livelihoods depend on the health of glacier ecosystems. In this work, we present a machine learning based approach to support ecological monitoring, with a focus on glaciers. Our approach is based on semi-automated mapping from satellite images. We utilize readily available remote sensing data to create a model to identify and outline both clean ice and debris-covered glaciers from satellite imagery. We also release data and develop a web tool that allows experts to visualize and correct model predictions, with the ultimate aim of accelerating the glacier mapping process.