Abstract:Human pose estimation is a critical task in computer vision and sports biomechanics, with applications spanning sports science, rehabilitation, and biomechanical research. While significant progress has been made in monocular 3D pose estimation, current datasets often fail to capture the complex, high-acceleration movements typical of competitive sports. In this work, we introduce AthletePose3D, a novel dataset designed to address this gap. AthletePose3D includes 12 types of sports motions across various disciplines, with approximately 1.3 million frames and 165 thousand individual postures, specifically capturing high-speed, high-acceleration athletic movements. We evaluate state-of-the-art (SOTA) monocular 2D and 3D pose estimation models on the dataset, revealing that models trained on conventional datasets perform poorly on athletic motions. However, fine-tuning these models on AthletePose3D notably reduces the SOTA model mean per joint position error (MPJPE) from 214mm to 65mm-a reduction of over 69%. We also validate the kinematic accuracy of monocular pose estimations through waveform analysis, highlighting strong correlations in joint angle estimations but limitations in velocity estimation. Our work provides a comprehensive evaluation of monocular pose estimation models in the context of sports, contributing valuable insights for advancing monocular pose estimation techniques in high-performance sports environments. The dataset, code, and model checkpoints are available at: https://github.com/calvinyeungck/AthletePose3D
Abstract:Sports analytics has become both more professional and sophisticated, driven by the growing availability of detailed performance data. This progress enables applications such as match outcome prediction, player scouting, and tactical analysis. In soccer, the effective utilization of event and tracking data is fundamental for capturing and analyzing the dynamics of the game. However, there are two primary challenges: the limited availability of event data, primarily restricted to top-tier teams and leagues, and the scarcity and high cost of tracking data, which complicates its integration with event data for comprehensive analysis. Here we propose OpenSTARLab, an open-source framework designed to democratize spatio-temporal agent data analysis in sports by addressing these key challenges. OpenSTARLab includes the Pre-processing Package that standardizes event and tracking data through Unified and Integrated Event Data and State-Action-Reward formats, the Event Modeling Package that implements deep learning-based event prediction, alongside the RLearn Package for reinforcement learning tasks. These technical components facilitate the handling of diverse data sources and support advanced analytical tasks, thereby enhancing the overall functionality and usability of the framework. To assess OpenSTARLab's effectiveness, we conducted several experimental evaluations. These demonstrate the superior performance of the specific event prediction model in terms of action and time prediction accuracies and maintained its robust event simulation performance. Furthermore, reinforcement learning experiments reveal a trade-off between action accuracy and temporal difference loss and show comprehensive visualization. Overall, OpenSTARLab serves as a robust platform for researchers and practitioners, enhancing innovation and collaboration in the field of soccer data analytics.
Abstract:Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning. However, current systems face significant limitations, including the need for large datasets in few-shot learning methods, a lack of personalized and actionable feedback, and an overemphasis on benchmark performance rather than student experience. To address these challenges, we propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes. Our study demonstrates the system's effectiveness through comprehensive evaluations, including survey responses from higher education students that indicate significant improvements in motivation, understanding, and preparedness compared to traditional grading methods. The results validate the AAG system's potential to transform educational assessment by prioritizing learning experiences and providing scalable, high-quality feedback.
Abstract:In soccer video analysis, player detection is essential for identifying key events and reconstructing tactical positions. The presence of numerous players and frequent occlusions, combined with copyright restrictions, severely restricts the availability of datasets, leaving limited options such as SoccerNet-Tracking and SportsMOT. These datasets suffer from a lack of diversity, which hinders algorithms from adapting effectively to varied soccer video contexts. To address these challenges, we developed SoccerSynth-Detection, the first synthetic dataset designed for the detection of synthetic soccer players. It includes a broad range of random lighting and textures, as well as simulated camera motion blur. We validated its efficacy using the object detection model (Yolov8n) against real-world datasets (SoccerNet-Tracking and SportsMoT). In transfer tests, it matched the performance of real datasets and significantly outperformed them in images with motion blur; in pre-training tests, it demonstrated its efficacy as a pre-training dataset, significantly enhancing the algorithm's overall performance. Our work demonstrates the potential of synthetic datasets to replace real datasets for algorithm training in the field of soccer video analysis.
Abstract:Multi-object tracking (MOT) is crucial for various multi-agent analyses such as evaluating team sports tactics and player movements and performance. While pedestrian tracking has advanced with Tracking-by-Detection MOT, team sports like basketball pose unique challenges. These challenges include players' unpredictable movements, frequent close interactions, and visual similarities that complicate pose labeling and lead to significant occlusions, frequent ID switches, and high manual annotation costs. To address these challenges, we propose a novel pose-based virtual marker (VM) MOT method for team sports, named Sports-vmTracking. This method builds on the vmTracking approach developed for multi-animal tracking with active learning. First, we constructed a 3x3 basketball pose dataset for VMs and applied active learning to enhance model performance in generating VMs. Then, we overlaid the VMs on video to identify players, extract their poses with unique IDs, and convert these into bounding boxes for comparison with automated MOT methods. Using our 3x3 basketball dataset, we demonstrated that our VM configuration has been highly effective, and reduced the need for manual corrections and labeling during pose model training while maintaining high accuracy. Our approach achieved an average HOTA score of 72.3%, over 10 points higher than other state-of-the-art methods without VM, and resulted in 0 ID switches. Beyond improving performance in handling occlusions and minimizing ID switches, our framework could substantially increase the time and cost efficiency compared to traditional manual annotation.
Abstract:The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field understanding, and player understanding. This year, the challenges encompass four vision-based tasks. (1) Ball Action Spotting, focusing on precisely localizing when and which soccer actions related to the ball occur, (2) Dense Video Captioning, focusing on describing the broadcast with natural language and anchored timestamps, (3) Multi-View Foul Recognition, a novel task focusing on analyzing multiple viewpoints of a potential foul incident to classify whether a foul occurred and assess its severity, (4) Game State Reconstruction, another novel task focusing on reconstructing the game state from broadcast videos onto a 2D top-view map of the field. Detailed information about the tasks, challenges, and leaderboards can be found at https://www.soccer-net.org, with baselines and development kits available at https://github.com/SoccerNet.
Abstract:Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusion (CMOO). Here, we propose an online and robust MOT approach, named Basketball-SORT, which focuses on the CMOO problems in basketball videos. To overcome the CMOO problem, instead of using the intersection-over-union-based (IoU-based) approach, we use the trajectories of neighboring frames based on the projected positions of the players. Our method designs the basketball game restriction (BGR) and reacquiring Long-Lost IDs (RLLI) based on the characteristics of basketball scenes, and we also solve the occlusion problem based on the player trajectories and appearance features. Experimental results show that our method achieves a Higher Order Tracking Accuracy (HOTA) score of 63.48$\%$ on the basketball fixed video dataset and outperforms other recent popular approaches. Overall, our approach solved the CMOO problem more effectively than recent MOT algorithms.
Abstract:Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.
Abstract:Deep learning has achieved remarkable success in recent years. Central to its success is its ability to learn representations that preserve task-relevant structure. However, massive energy, compute, and data costs are required to learn general representations. This paper explores Hyperdimensional Computing (HDC), a computationally and data-efficient brain-inspired alternative. HDC acts as a bridge between connectionist and symbolic approaches to artificial intelligence (AI), allowing explicit specification of representational structure as in symbolic approaches while retaining the flexibility of connectionist approaches. However, HDC's simplicity poses challenges for encoding complex compositional structures, especially in its binding operation. To address this, we propose Generalized Holographic Reduced Representations (GHRR), an extension of Fourier Holographic Reduced Representations (FHRR), a specific HDC implementation. GHRR introduces a flexible, non-commutative binding operation, enabling improved encoding of complex data structures while preserving HDC's desirable properties of robustness and transparency. In this work, we introduce the GHRR framework, prove its theoretical properties and its adherence to HDC properties, explore its kernel and binding characteristics, and perform empirical experiments showcasing its flexible non-commutativity, enhanced decoding accuracy for compositional structures, and improved memorization capacity compared to FHRR.
Abstract:Vector Symbolic Architectures (VSAs) have emerged as a novel framework for enabling interpretable machine learning algorithms equipped with the ability to reason and explain their decision processes. The basic idea is to represent discrete information through high dimensional random vectors. Complex data structures can be built up with operations over vectors such as the "binding" operation involving element-wise vector multiplication, which associates data together. The reverse task of decomposing the associated elements is a combinatorially hard task, with an exponentially large search space. The main algorithm for performing this search is the resonator network, inspired by Hopfield network-based memory search operations. In this work, we introduce a new variant of the resonator network, based on self-attention based update rules in the iterative search problem. This update rule, based on the Hopfield network with log-sum-exp energy function and norm-bounded states, is shown to substantially improve the performance and rate of convergence. As a result, our algorithm enables a larger capacity for associative memory, enabling applications in many tasks like perception based pattern recognition, scene decomposition, and object reasoning. We substantiate our algorithm with a thorough evaluation and comparisons to baselines.