Abstract:Automated grading has become an essential tool in education technology due to its ability to efficiently assess large volumes of student work, provide consistent and unbiased evaluations, and deliver immediate feedback to enhance learning. However, current systems face significant limitations, including the need for large datasets in few-shot learning methods, a lack of personalized and actionable feedback, and an overemphasis on benchmark performance rather than student experience. To address these challenges, we propose a Zero-Shot Large Language Model (LLM)-Based Automated Assignment Grading (AAG) system. This framework leverages prompt engineering to evaluate both computational and explanatory student responses without requiring additional training or fine-tuning. The AAG system delivers tailored feedback that highlights individual strengths and areas for improvement, thereby enhancing student learning outcomes. Our study demonstrates the system's effectiveness through comprehensive evaluations, including survey responses from higher education students that indicate significant improvements in motivation, understanding, and preparedness compared to traditional grading methods. The results validate the AAG system's potential to transform educational assessment by prioritizing learning experiences and providing scalable, high-quality feedback.
Abstract:In soccer video analysis, player detection is essential for identifying key events and reconstructing tactical positions. The presence of numerous players and frequent occlusions, combined with copyright restrictions, severely restricts the availability of datasets, leaving limited options such as SoccerNet-Tracking and SportsMOT. These datasets suffer from a lack of diversity, which hinders algorithms from adapting effectively to varied soccer video contexts. To address these challenges, we developed SoccerSynth-Detection, the first synthetic dataset designed for the detection of synthetic soccer players. It includes a broad range of random lighting and textures, as well as simulated camera motion blur. We validated its efficacy using the object detection model (Yolov8n) against real-world datasets (SoccerNet-Tracking and SportsMoT). In transfer tests, it matched the performance of real datasets and significantly outperformed them in images with motion blur; in pre-training tests, it demonstrated its efficacy as a pre-training dataset, significantly enhancing the algorithm's overall performance. Our work demonstrates the potential of synthetic datasets to replace real datasets for algorithm training in the field of soccer video analysis.
Abstract:Multi-object tracking (MOT) is crucial for various multi-agent analyses such as evaluating team sports tactics and player movements and performance. While pedestrian tracking has advanced with Tracking-by-Detection MOT, team sports like basketball pose unique challenges. These challenges include players' unpredictable movements, frequent close interactions, and visual similarities that complicate pose labeling and lead to significant occlusions, frequent ID switches, and high manual annotation costs. To address these challenges, we propose a novel pose-based virtual marker (VM) MOT method for team sports, named Sports-vmTracking. This method builds on the vmTracking approach developed for multi-animal tracking with active learning. First, we constructed a 3x3 basketball pose dataset for VMs and applied active learning to enhance model performance in generating VMs. Then, we overlaid the VMs on video to identify players, extract their poses with unique IDs, and convert these into bounding boxes for comparison with automated MOT methods. Using our 3x3 basketball dataset, we demonstrated that our VM configuration has been highly effective, and reduced the need for manual corrections and labeling during pose model training while maintaining high accuracy. Our approach achieved an average HOTA score of 72.3%, over 10 points higher than other state-of-the-art methods without VM, and resulted in 0 ID switches. Beyond improving performance in handling occlusions and minimizing ID switches, our framework could substantially increase the time and cost efficiency compared to traditional manual annotation.
Abstract:Quantum signal processing (QSP) and quantum singular value transformation (QSVT) have provided a unified framework for understanding many quantum algorithms, including factorization, matrix inversion, and Hamiltonian simulation. As a multivariable version of QSP, multivariable quantum signal processing (M-QSP) is proposed. M-QSP interleaves signal operators corresponding to each variable with signal processing operators, which provides an efficient means to perform multivariable polynomial transformations. However, the necessary and sufficient condition for what types of polynomials can be constructed by M-QSP is unknown. In this paper, we propose a classical algorithm to determine whether a given pair of multivariable Laurent polynomials can be implemented by M-QSP, which returns True or False. As one of the most important properties of this algorithm, it returning True is the necessary and sufficient condition. The proposed classical algorithm runs in polynomial time in the number of variables and signal operators. Our algorithm also provides a constructive method to select the necessary parameters for implementing M-QSP. These findings offer valuable insights for identifying practical applications of M-QSP.
Abstract:Pedestrian action prediction is of great significance for many applications such as autonomous driving. However, state-of-the-art methods lack explainability to make trustworthy predictions. In this paper, a novel framework called MulCPred is proposed that explains its predictions based on multi-modal concepts represented by training samples. Previous concept-based methods have limitations including: 1) they cannot directly apply to multi-modal cases; 2) they lack locality to attend to details in the inputs; 3) they suffer from mode collapse. These limitations are tackled accordingly through the following approaches: 1) a linear aggregator to integrate the activation results of the concepts into predictions, which associates concepts of different modalities and provides ante-hoc explanations of the relevance between the concepts and the predictions; 2) a channel-wise recalibration module that attends to local spatiotemporal regions, which enables the concepts with locality; 3) a feature regularization loss that encourages the concepts to learn diverse patterns. MulCPred is evaluated on multiple datasets and tasks. Both qualitative and quantitative results demonstrate that MulCPred is promising in improving the explainability of pedestrian action prediction without obvious performance degradation. Furthermore, by removing unrecognizable concepts from MulCPred, the cross-dataset prediction performance is improved, indicating the feasibility of further generalizability of MulCPred.
Abstract:The study of collective animal behavior, especially in aquatic environments, presents unique challenges and opportunities for understanding movement and interaction patterns in the field of ethology, ecology, and bio-navigation. The Fish Tracking Challenge 2024 (https://ftc-2024.github.io/) introduces a multi-object tracking competition focused on the intricate behaviors of schooling sweetfish. Using the SweetFish dataset, participants are tasked with developing advanced tracking models to accurately monitor the locations of 10 sweetfishes simultaneously. This paper introduces the competition's background, objectives, the SweetFish dataset, and the appraoches of the 1st to 3rd winners and our baseline. By leveraging video data and bounding box annotations, the competition aims to foster innovation in automatic detection and tracking algorithms, addressing the complexities of aquatic animal movements. The challenge provides the importance of multi-object tracking for discovering the dynamics of collective animal behavior, with the potential to significantly advance scientific understanding in the above fields.
Abstract:Understanding human actions from videos is essential in many domains, including sports. In figure skating, technical judgments are performed by watching skaters' 3D movements, and its part of the judging procedure can be regarded as a Temporal Action Segmentation (TAS) task. TAS tasks in figure skating that automatically assign temporal semantics to video are actively researched. However, there is a lack of datasets and effective methods for TAS tasks requiring 3D pose data. In this study, we first created the FS-Jump3D dataset of complex and dynamic figure skating jumps using optical markerless motion capture. We also propose a new fine-grained figure skating jump TAS dataset annotation method with which TAS models can learn jump procedures. In the experimental results, we validated the usefulness of 3D pose features as input and the fine-grained dataset for the TAS model in figure skating. FS-Jump3D Dataset is available at https://github.com/ryota-skating/FS-Jump3D.
Abstract:Recent deep learning-based object detection approaches have led to significant progress in multi-object tracking (MOT) algorithms. The current MOT methods mainly focus on pedestrian or vehicle scenes, but basketball sports scenes are usually accompanied by three or more object occlusion problems with similar appearances and high-intensity complex motions, which we call complex multi-object occlusion (CMOO). Here, we propose an online and robust MOT approach, named Basketball-SORT, which focuses on the CMOO problems in basketball videos. To overcome the CMOO problem, instead of using the intersection-over-union-based (IoU-based) approach, we use the trajectories of neighboring frames based on the projected positions of the players. Our method designs the basketball game restriction (BGR) and reacquiring Long-Lost IDs (RLLI) based on the characteristics of basketball scenes, and we also solve the occlusion problem based on the player trajectories and appearance features. Experimental results show that our method achieves a Higher Order Tracking Accuracy (HOTA) score of 63.48$\%$ on the basketball fixed video dataset and outperforms other recent popular approaches. Overall, our approach solved the CMOO problem more effectively than recent MOT algorithms.
Abstract:In professional basketball, the accurate prediction of scoring opportunities based on strategic decision-making is crucial for space and player evaluations. However, traditional models often face challenges in accounting for the complexities of off-ball movements, which are essential for accurate predictive performance. In this study, we propose two mathematical models to predict off-ball scoring opportunities in basketball, considering both pass-to-score and dribble-to-score movements: the Ball Movement for Off-ball Scoring (BMOS) and the Ball Intercept and Movement for Off-ball Scoring (BIMOS) models. The BMOS adapts principles from the Off-Ball Scoring Opportunities (OBSO) model, originally designed for soccer, to basketball, whereas the BIMOS also incorporates the likelihood of interception during ball movements. We evaluated these models using player tracking data from 630 NBA games in the 2015-2016 regular season, demonstrating that the BIMOS outperforms the BMOS in terms of scoring prediction accuracy. Thus, our models provide valuable insights for tactical analysis and player evaluation in basketball.
Abstract:Image understanding is a foundational task in computer vision, with recent applications emerging in soccer posture analysis. However, existing publicly available datasets lack comprehensive information, notably in the form of posture sequences and 2D pose annotations. Moreover, current analysis models often rely on interpretable linear models (e.g., PCA and regression), limiting their capacity to capture non-linear spatiotemporal relationships in complex and diverse scenarios. To address these gaps, we introduce the 3D Shot Posture (3DSP) dataset in soccer broadcast videos, which represents the most extensive sports image dataset with 2D pose annotations to our knowledge. Additionally, we present the 3DSP-GRAE (Graph Recurrent AutoEncoder) model, a non-linear approach for embedding pose sequences. Furthermore, we propose AutoSoccerPose, a pipeline aimed at semi-automating 2D and 3D pose estimation and posture analysis. While achieving full automation proved challenging, we provide a foundational baseline, extending its utility beyond the scope of annotated data. We validate AutoSoccerPose on SoccerNet and 3DSP datasets, and present posture analysis results based on 3DSP. The dataset, code, and models are available at: https://github.com/calvinyeungck/3D-Shot-Posture-Dataset.