Max Planck Institute for Astronomy, Heidelberg
Abstract:I introduce a general, Bayesian method for modelling univariate time series data assumed to be drawn from a continuous, stochastic process. The method accommodates arbitrary temporal sampling, and takes into account measurement uncertainties for arbitrary error models (not just Gaussian) on both the time and signal variables. Any model for the deterministic component of the variation of the signal with time is supported, as is any model of the stochastic component on the signal and time variables. Models illustrated here are constant and sinusoidal models for the signal mean combined with a Gaussian stochastic component, as well as a purely stochastic model, the Ornstein-Uhlenbeck process. The posterior probability distribution over model parameters is determined via Monte Carlo sampling. Models are compared using the "cross-validation likelihood", in which the posterior-averaged likelihood for different partitions of the data are combined. In principle this is more robust to changes in the prior than is the evidence (the prior-averaged likelihood). The method is demonstrated by applying it to the light curves of 11 ultra cool dwarf stars, claimed by a previous study to show statistically significant variability. This is reassessed here by calculating the cross-validation likelihood for various time series models, including a null hypothesis of no variability beyond the error bars. 10 of 11 light curves are confirmed as being significantly variable, and one of these seems to be periodic, with two plausible periods identified. Another object is best described by the Ornstein-Uhlenbeck process, a conclusion which is obviously limited to the set of models actually tested.
Abstract:Gaia will obtain astrometry and spectrophotometry for essentially all sources in the sky down to a broad band magnitude limit of G=20, an expected yield of 10^9 stars. Its main scientific objective is to reveal the formation and evolution of our Galaxy through chemo-dynamical analysis. In addition to inferring positions, parallaxes and proper motions from the astrometry, we must also infer the astrophysical parameters of the stars from the spectrophotometry, the BP/RP spectrum. Here we investigate the performance of three different algorithms (SVM, ILIUM, Aeneas) for estimating the effective temperature, line-of-sight interstellar extinction, metallicity and surface gravity of A-M stars over a wide range of these parameters and over the full magnitude range Gaia will observe (G=6-20mag). One of the algorithms, Aeneas, infers the posterior probability density function over all parameters, and can optionally take into account the parallax and the Hertzsprung-Russell diagram to improve the estimates. For all algorithms the accuracy of estimation depends on G and on the value of the parameters themselves, so a broad summary of performance is only approximate. For stars at G=15 with less than two magnitudes extinction, we expect to be able to estimate Teff to within 1%, logg to 0.1-0.2dex, and [Fe/H] (for FGKM stars) to 0.1-0.2dex, just using the BP/RP spectrum (mean absolute error statistics are quoted). Performance degrades at larger extinctions, but not always by a large amount. Extinction can be estimated to an accuracy of 0.05-0.2mag for stars across the full parameter range with a priori unknown extinction between 0 and 10mag. Performance degrades at fainter magnitudes, but even at G=19 we can estimate logg to better than 0.2dex for all spectral types, and [Fe/H] to within 0.35dex for FGKM stars, for extinctions below 1mag.
Abstract:I introduce an algorithm for estimating parameters from multidimensional data based on forward modelling. In contrast to many machine learning approaches it avoids fitting an inverse model and the problems associated with this. The algorithm makes explicit use of the sensitivities of the data to the parameters, with the goal of better treating parameters which only have a weak impact on the data. The forward modelling approach provides uncertainty (full covariance) estimates in the predicted parameters as well as a goodness-of-fit for observations. I demonstrate the algorithm, ILIUM, with the estimation of stellar astrophysical parameters (APs) from simulations of the low resolution spectrophotometry to be obtained by Gaia. The AP accuracy is competitive with that obtained by a support vector machine. For example, for zero extinction stars covering a wide range of metallicity, surface gravity and temperature, ILIUM can estimate Teff to an accuracy of 0.3% at G=15 and to 4% for (lower signal-to-noise ratio) spectra at G=20. [Fe/H] and logg can be estimated to accuracies of 0.1-0.4dex for stars with G<=18.5. If extinction varies a priori over a wide range (Av=0-10mag), then Teff and Av can be estimated quite accurately (3-4% and 0.1-0.2mag respectively at G=15), but there is a strong and ubiquitous degeneracy in these parameters which limits our ability to estimate either accurately at faint magnitudes. Using the forward model we can map these degeneracies (in advance), and thus provide a complete probability distribution over solutions. (Abridged)
Abstract:We develop and demonstrate a probabilistic method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification, because identification of a pure quasar sample is necessary to define the Gaia astrometric reference frame. By varying a posterior probability threshold in DSC we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. We further show that not accounting for class priors in the target population leads to serious misclassifications and poor predictions for sample completeness and contamination. (Truncated)
Abstract:Designing a photometric system to best fulfil a set of scientific goals is a complex task, demanding a compromise between conflicting requirements and subject to various constraints. A specific example is the determination of stellar astrophysical parameters (APs) - effective temperature, metallicity etc. - across a wide range of stellar types. I present a novel approach to this problem which makes minimal assumptions about the required filter system. By considering a filter system as a set of free parameters it may be designed by optimizing some figure-of-merit (FoM) with respect to these parameters. In the example considered, the FoM is a measure of how well the filter system can `separate' stars with different APs. This separation is vectorial in nature, in the sense that the local directions of AP variance are preferably mutually orthogonal to avoid AP degeneracy. The optimization is carried out with an evolutionary algorithm, which uses principles of evolutionary biology to search the parameter space. This model, HFD (Heuristic Filter Design), is applied to the design of photometric systems for the Gaia space astrometry mission. The optimized systems show a number of interesting features, not least the persistence of broad, overlapping filters. These HFD systems perform as least as well as other proposed systems for Gaia, although inadequacies remain in all. The principles underlying HFD are quite generic and may be applied to filter design for numerous other projects, such as the search for specific types of objects or photometric redshift determination.