MIT
Abstract:We explore in some detail the notion of algorithmic stability as a viable framework for analyzing the generalization error of learning algorithms. We introduce the new notion of training stability of a learning algorithm and show that, in a general setting, it is sufficient for good bounds on generalization error. In the PAC setting, training stability is both necessary and sufficient for learnability.\ The approach based on training stability makes no reference to VC dimension or VC entropy. There is no need to prove uniform convergence, and generalization error is bounded directly via an extended McDiarmid inequality. As a result it potentially allows us to deal with a broader class of learning algorithms than Empirical Risk Minimization. \ We also explore the relationships among VC dimension, generalization error, and various notions of stability. Several examples of learning algorithms are considered.
Abstract:Frequency counts are a measure of how much use a language makes of a linguistic unit, such as a phoneme or word. However, what is often important is not the units themselves, but the contrasts between them. A measure is therefore needed for how much use a language makes of a contrast, i.e. the functional load (FL) of the contrast. We generalize previous work in linguistics and speech recognition and propose a family of measures for the FL of several phonological contrasts, including phonemic oppositions, distinctive features, suprasegmentals, and phonological rules. We then test it for robustness to changes of corpora. Finally, we provide examples in Cantonese, Dutch, English, German and Mandarin, in the context of historical linguistics, language acquisition and speech recognition. More information can be found at http://dinoj.info/research/fload
Abstract:In Phys. Rev. Letters (73:2, 5 Dec. 94), Mantegna et al. conclude on the basis of Zipf rank frequency data that noncoding DNA sequence regions are more like natural languages than coding regions. We argue on the contrary that an empirical fit to Zipf's ``law'' cannot be used as a criterion for similarity to natural languages. Although DNA is a presumably an ``organized system of signs'' in Mandelbrot's (1961) sense, an observation of statistical features of the sort presented in the Mantegna et al. paper does not shed light on the similarity between DNA's ``grammar'' and natural language grammars, just as the observation of exact Zipf-like behavior cannot distinguish between the underlying processes of tossing an $M$ sided die or a finite-state branching process.