Abstract:In this note we use the State of the Union Address dataset from Kaggle to make some surprising (and some not so surprising) observations pertaining to the general timeline of American history, and the character and nature of the addresses themselves. Our main approach is using vector embeddings, such as BERT (DistilBERT) and GPT-2. While it is widely believed that BERT (and its variations) is most suitable for NLP classification tasks, we find out that GPT-2 in conjunction with nonlinear dimension reduction methods such as UMAP provide better separation and stronger clustering. This makes GPT-2 + UMAP an interesting alternative. In our case, no model fine-tuning is required, and the pre-trained out-of-the-box GPT-2 model is enough. We also used a fine-tuned DistilBERT model for classification (detecting which president delivered which address), with very good results (accuracy 93% - 95% depending on the run). All computations can be replicated by using the accompanying code on GitHub.
Abstract:We analyze completely the convergence speed of the \emph{batch learning algorithm}, and compare its speed to that of the memoryless learning algorithm and of learning with memory. We show that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance \emph{vis-a-vis} learning with full memory is less clearcut, and depends on certain probabilistic assumptions.
Abstract:We study the convergence properties of a pair of learning algorithms (learning with and without memory). This leads us to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, which leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws. The reader can find the proofs of most of the results announced here in the paper entitled "Harmonic mean, random polynomials, and random matrices", by the same authors.
Abstract:Motivated by a problem in learning theory, we are led to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distributions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, and that, in turn, leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws.
Abstract:We study the convergence speed of the batch learning algorithm, and compare its speed to that of the memoryless learning algorithm and of learning with memory (as analyzed in joint work with N. Komarova). We obtain precise results and show in particular that the batch learning algorithm is never worse than the memoryless learning algorithm (at least asymptotically). Its performance vis-a-vis learning with full memory is less clearcut, and depends on certainprobabilistic assumptions. These results necessitate theintroduction of the moment zeta function of a probability distribution and the study of some of its properties.