Abstract:Capturing the playing style of professional soccer coaches is a complex, and yet barely explored, task in sports analytics. Nowadays, the availability of digital data describing every relevant spatio-temporal aspect of soccer matches, allows for capturing and analyzing the playing style of players, teams, and coaches in an automatic way. In this paper, we present coach2vec, a workflow to capture the playing style of professional coaches using match event streams and artificial intelligence. Coach2vec extracts ball possessions from each match, clusters them based on their similarity, and reconstructs the typical ball possessions of coaches. Then, it uses an autoencoder, a type of artificial neural network, to obtain a concise representation (encoding) of the playing style of each coach. Our experiments, conducted on soccer-logs describing the last four seasons of the Italian first division, reveal interesting similarities between prominent coaches, paving the road to the simulation of playing styles and the quantitative comparison of professional coaches.
Abstract:The performance of soccer players is one of most discussed aspects by many actors in the soccer industry: from supporters to journalists, from coaches to talent scouts. Unfortunately, the dashboards available online provide no effective way to compare the evolution of the performance of players or to find players behaving similarly on the field. This paper describes the design of a web dashboard that interacts via APIs with a performance evaluation algorithm and provides graphical tools that allow the user to perform many tasks, such as to search or compare players by age, role or trend of growth in their performance, find similar players based on their pitching behavior, change the algorithm's parameters to obtain customized performance scores. We also describe an example of how a talent scout can interact with the dashboard to find young, promising talents.
Abstract:Soccer analytics is attracting increasing interest in academia and industry, thanks to the availability of data that describe all the spatio-temporal events that occur in each match. These events (e.g., passes, shots, fouls) are collected by human operators manually, constituting a considerable cost for data providers in terms of time and economic resources. In this paper, we describe PassNet, a method to recognize the most frequent events in soccer, i.e., passes, from video streams. Our model combines a set of artificial neural networks that perform feature extraction from video streams, object detection to identify the positions of the ball and the players, and classification of frame sequences as passes or not passes. We test PassNet on different scenarios, depending on the similarity of conditions to the match used for training. Our results show good classification results and significant improvement in the accuracy of pass detection with respect to baseline classifiers, even when the match's video conditions of the test and training sets are considerably different. PassNet is the first step towards an automated event annotation system that may break the time and the costs for event annotation, enabling data collections for minor and non-professional divisions, youth leagues and, in general, competitions whose matches are not currently annotated by data providers.
Abstract:Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multi-dimensional approach to injury forecasting in professional soccer that is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We then construct an injury forecaster and show that it is both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.
Abstract:The problem of rating the performance of soccer players is attracting the interest of many companies, websites, and the scientific community, thanks to the availability of massive data capturing all the events generated during a game (e.g., tackles, passes, shots, etc.). Existing approaches fail to fully exploit the richness of the available data and lack of a proper validation. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We validate the framework through an experimental analysis advised by soccer experts, based on a massive dataset of millions of events pertaining four seasons of the five prominent European leagues. Experiments show that PlayeRank is robust in agreeing with the experts' evaluation of players, significantly improving the state of the art. We also explore an application of PlayeRank --- i.e. searching players --- by introducing a special form of spatial query on the soccer field. This shows its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.
Abstract:Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation process remain unclear. Here we analyse a massive dataset capturing players' evaluations by human judges to explore human perception of performance in soccer, the world's most popular sport. We use machine learning to design an artificial judge which accurately reproduces human evaluation, allowing us to demonstrate how human observers are biased towards diverse contextual features. By investigating the structure of the artificial judge, we uncover the aspects of the players' behavior which attract the attention of human judges, demonstrating that human evaluation is based on a noticeability heuristic where only feature values far from the norm are considered to rate an individual's performance.
Abstract:The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.