INAF, Padova
Abstract:Gaia will obtain astrometry and spectrophotometry for essentially all sources in the sky down to a broad band magnitude limit of G=20, an expected yield of 10^9 stars. Its main scientific objective is to reveal the formation and evolution of our Galaxy through chemo-dynamical analysis. In addition to inferring positions, parallaxes and proper motions from the astrometry, we must also infer the astrophysical parameters of the stars from the spectrophotometry, the BP/RP spectrum. Here we investigate the performance of three different algorithms (SVM, ILIUM, Aeneas) for estimating the effective temperature, line-of-sight interstellar extinction, metallicity and surface gravity of A-M stars over a wide range of these parameters and over the full magnitude range Gaia will observe (G=6-20mag). One of the algorithms, Aeneas, infers the posterior probability density function over all parameters, and can optionally take into account the parallax and the Hertzsprung-Russell diagram to improve the estimates. For all algorithms the accuracy of estimation depends on G and on the value of the parameters themselves, so a broad summary of performance is only approximate. For stars at G=15 with less than two magnitudes extinction, we expect to be able to estimate Teff to within 1%, logg to 0.1-0.2dex, and [Fe/H] (for FGKM stars) to 0.1-0.2dex, just using the BP/RP spectrum (mean absolute error statistics are quoted). Performance degrades at larger extinctions, but not always by a large amount. Extinction can be estimated to an accuracy of 0.05-0.2mag for stars across the full parameter range with a priori unknown extinction between 0 and 10mag. Performance degrades at fainter magnitudes, but even at G=19 we can estimate logg to better than 0.2dex for all spectral types, and [Fe/H] to within 0.35dex for FGKM stars, for extinctions below 1mag.
Abstract:We develop and demonstrate a probabilistic method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification, because identification of a pure quasar sample is necessary to define the Gaia astrometric reference frame. By varying a posterior probability threshold in DSC we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. We further show that not accounting for class priors in the target population leads to serious misclassifications and poor predictions for sample completeness and contamination. (Truncated)