Abstract:Scaling laws offer valuable insights into the design of time series foundation models (TSFMs). However, previous research has largely focused on the scaling laws of TSFMs for in-distribution (ID) data, leaving their out-of-distribution (OOD) scaling behavior and the influence of model architectures less explored. In this work, we examine two common TSFM architectures, encoder-only and decoder-only Transformers, and investigate their scaling behavior on both ID and OOD data. These models are trained and evaluated across varying parameter counts, compute budgets, and dataset sizes. Our experiments reveal that the log-likelihood loss of TSFMs exhibits similar scaling behavior in both OOD and ID settings. We further compare the scaling properties across different architectures, incorporating two state-of-the-art TSFMs as case studies, showing that model architecture plays a significant role in scaling. The encoder-only Transformers demonstrate better scalability than the decoder-only Transformers, while the architectural enhancements in the two advanced TSFMs primarily improve ID performance but reduce OOD scalability. While scaling up TSFMs is expected to drive performance breakthroughs, the lack of a comprehensive understanding of TSFM scaling laws has hindered the development of a robust framework to guide model scaling. We fill this gap in this work by synthesizing our findings and providing practical guidelines for designing and scaling larger TSFMs with enhanced model capabilities.
Abstract:Hyperspectral image change detection (HSI-CD) aims to identify the differences in bitemporal HSIs. To mitigate spectral redundancy and improve the discriminativeness of changing features, some methods introduced band selection technology to select bands conducive for CD. However, these methods are limited by the inability to end-to-end training with the deep learning-based feature extractor and lack considering the complex nonlinear relationship among bands. In this paper, we propose an end-to-end efficient spectral-spatial change detection network (ES2Net) to address these issues. Specifically, we devised a learnable band selection module to automatically select bands conducive to CD. It can be jointly optimized with a feature extraction network and capture the complex nonlinear relationships among bands. Moreover, considering the large spatial feature distribution differences among different bands, we design the cluster-wise spatial attention mechanism that assigns a spatial attention factor to each individual band to individually improve the feature discriminativeness for each band. Experiments on three widely used HSI-CD datasets demonstrate the effectiveness and superiority of this method compared with other state-of-the-art methods.