Abstract:Vision-language foundation models, such as CLIP, have shown unprecedented zero-shot performance across a wide range of tasks. Nevertheless, these models may be unreliable under distributional shifts, as their performance is significantly degraded. In this work, we explore how to efficiently leverage class text information to mitigate these distribution drifts encountered by large pre-trained vision-language models (VLMs) during test-time inference. In particular, we propose to generate pseudo-labels for the test-time samples by exploiting generic class text embeddings as fixed centroids of a label assignment problem, which is efficiently solved with Optimal Transport. Furthermore, the proposed adaptation method (CLIP-OT) integrates a multiple template knowledge distillation approach, which replicates multi-view contrastive learning strategies in unsupervised representation learning but without incurring additional computational complexity. Extensive experiments on multiple popular test-time adaptation benchmarks presenting diverse complexity empirically show the superiority of CLIP-OT, achieving performance gains of up to 7% over recent state-of-the-art methods, yet being computationally and memory efficient.
Abstract:In modern urban centers, effective transportation management poses a significant challenge, with traffic jams and inconsistent travel durations greatly affecting commuters and logistics operations. This study introduces a novel method for enhancing urban mobility by combining machine learning algorithms with live traffic information. We developed predictive models for journey time and congestion analysis using data from New York City's yellow taxi trips. The research employed a spatiotemporal analysis framework to identify traffic trends and implemented real-time route optimization using the GraphHopper API. This system determines the most efficient paths based on current conditions, adapting to changes in traffic flow. The methodology utilizes Spark MLlib for predictive modeling and Spark Streaming for processing data in real-time. By integrating historical data analysis with current traffic inputs, our system shows notable enhancements in both travel time forecasts and route optimization, demonstrating its potential for widespread application in major urban areas. This research contributes to ongoing efforts aimed at reducing urban congestion and improving transportation efficiency through advanced data-driven methods.
Abstract:State-of-the-art semi-supervised learning (SSL) approaches rely on highly confident predictions to serve as pseudo-labels that guide the training on unlabeled samples. An inherent drawback of this strategy stems from the quality of the uncertainty estimates, as pseudo-labels are filtered only based on their degree of uncertainty, regardless of the correctness of their predictions. Thus, assessing and enhancing the uncertainty of network predictions is of paramount importance in the pseudo-labeling process. In this work, we empirically demonstrate that SSL methods based on pseudo-labels are significantly miscalibrated, and formally demonstrate the minimization of the min-entropy, a lower bound of the Shannon entropy, as a potential cause for miscalibration. To alleviate this issue, we integrate a simple penalty term, which enforces the logit distances of the predictions on unlabeled samples to remain low, preventing the network predictions to become overconfident. Comprehensive experiments on a variety of SSL image classification benchmarks demonstrate that the proposed solution systematically improves the calibration performance of relevant SSL models, while also enhancing their discriminative power, being an appealing addition to tackle SSL tasks.