Earth Observation (EO) data encompass a vast range of remotely sensed information, featuring multi-sensor and multi-temporal, playing an indispensable role in understanding our planet's dynamics. Recently, Vision Language Models (VLMs) have achieved remarkable success in perception and reasoning tasks, bringing new insights and opportunities to the EO field. However, the potential for EO applications, especially for scientific regression related applications remains largely unexplored. This paper bridges that gap by systematically examining the challenges and opportunities of adapting VLMs for EO regression tasks. The discussion first contrasts the distinctive properties of EO data with conventional computer vision datasets, then identifies four core obstacles in applying VLMs to EO regression: 1) the absence of dedicated benchmarks, 2) the discrete-versus-continuous representation mismatch, 3) cumulative error accumulation, and 4) the suboptimal nature of text-centric training objectives for numerical tasks. Next, a series of methodological insights and potential subtle pitfalls are explored. Lastly, we offer some promising future directions for designing robust, domain-aware solutions. Our findings highlight the promise of VLMs for scientific regression in EO, setting the stage for more precise and interpretable modeling of critical environmental processes.