Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jerrin Bright

Ice Hockey Puck Localization Using Contextual Cues

Jun 04, 2025

Liam Salass, Jerrin Bright, Amir Nazemi, Yuhao Chen, John Zelek, David Clausi

Abstract:Puck detection in ice hockey broadcast videos poses significant challenges due to the puck's small size, frequent occlusions, motion blur, broadcast artifacts, and scale inconsistencies due to varying camera zoom and broadcast camera viewpoints. Prior works focus on appearance-based or motion-based cues of the puck without explicitly modelling the cues derived from player behaviour. Players consistently turn their bodies and direct their gaze toward the puck. Motivated by this strong contextual cue, we propose Puck Localization Using Contextual Cues (PLUCC), a novel approach for scale-aware and context-driven single-frame puck detections. PLUCC consists of three components: (a) a contextual encoder, which utilizes player orientations and positioning as helpful priors; (b) a feature pyramid encoder, which extracts multiscale features from the dual encoders; and (c) a gating decoder that combines latent features with a channel gating mechanism. For evaluation, in addition to standard average precision, we propose Rink Space Localization Error (RSLE), a scale-invariant homography-based metric for removing perspective bias from rink space evaluation. The experimental results of PLUCC on the PuckDataset dataset demonstrated state-of-the-art detection performance, surpassing previous baseline methods by an average precision improvement of 12.2% and RSLE average precision of 25%. Our research demonstrates the critical role of contextual understanding in improving puck detection performance, with broad implications for automated sports analysis.

Via

Access Paper or Ask Questions

PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics

May 13, 2024

Jerrin Bright, Bavesh Balaji, Yuhao Chen, David A Clausi, John S Zelek

Abstract:In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.

* IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'24)

Via

Access Paper or Ask Questions

Domain-Guided Masked Autoencoders for Unique Player Identification

Mar 17, 2024

Bavesh Balaji, Jerrin Bright, Sirisha Rambhatla, Yuhao Chen, Alexander Wong, John Zelek, David A Clausi

Figure 1 for Domain-Guided Masked Autoencoders for Unique Player Identification

Figure 2 for Domain-Guided Masked Autoencoders for Unique Player Identification

Figure 3 for Domain-Guided Masked Autoencoders for Unique Player Identification

Figure 4 for Domain-Guided Masked Autoencoders for Unique Player Identification

Abstract:Unique player identification is a fundamental module in vision-driven sports analytics. Identifying players from broadcast videos can aid with various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatic detection of jersey numbers using deep features is challenging primarily due to: a) motion blur, b) low resolution video feed, and c) occlusions. With their recent success in various vision tasks, masked autoencoders (MAEs) have emerged as a superior alternative to conventional feature extractors. However, most MAEs simply zero-out image patches either randomly or focus on where to mask rather than how to mask. Motivated by human vision, we devise a novel domain-guided masking policy for MAEs termed d-MAE to facilitate robust feature extraction in the presence of motion blur for player identification. We further introduce a new spatio-temporal network leveraging our novel d-MAE for unique player identification. We conduct experiments on three large-scale sports datasets, including a curated baseball dataset, the SoccerNet dataset, and an in-house ice hockey dataset. We preprocess the datasets using an upgraded keyframe identification (KfID) module by focusing on frames containing jersey numbers. Additionally, we propose a keyframe-fusion technique to augment keyframes, preserving spatial and temporal context. Our spatio-temporal network showcases significant improvements, surpassing the current state-of-the-art by 8.58%, 4.29%, and 1.20% in the test set accuracies, respectively. Rigorous ablations highlight the effectiveness of our domain-guided masking approach and the refined KfID module, resulting in performance enhancements of 1.48% and 1.84% respectively, compared to original architectures.

* Submitted to 21st International Conference on Robots and Vision (CRV'24), Guelph, Ontario, Canada

Via

Access Paper or Ask Questions

Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

Mar 14, 2024

Jerrin Bright, Bavesh Balaji, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek

Figure 1 for Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

Figure 2 for Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

Figure 3 for Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

Figure 4 for Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery

Abstract:Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision. Existing works resort to either pose priors or multi-modal data such as multi-view or point cloud information, though their methods often overlook the valuable scene-depth information inherently present in a single image. Moreover, achieving robust HMR for out-of-distribution (OOD) data is exceedingly challenging due to inherent variations in pose, shape and depth. Consequently, understanding the underlying distribution becomes a vital subproblem in modeling human forms. Motivated by the need for unambiguous and robust human modeling, we introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture meticulously designed to minimize the disparity between distributions and incorporate scene-depth leveraging prior depth information. Our approach demonstrates superior performance in handling OOD data in certain scenarios while consistently achieving competitive results against state-of-the-art HMR methods on controlled datasets.

* Submitted to 21st International Conference on Robots and Vision (CRV'24), Guelph, Ontario, Canada

Via

Access Paper or Ask Questions

Jersey Number Recognition using Keyframe Identification from Low-Resolution Broadcast Videos

Sep 12, 2023

Bavesh Balaji, Jerrin Bright, Harish Prakash, Yuhao Chen, David A Clausi, John Zelek

Abstract:Player identification is a crucial component in vision-driven soccer analytics, enabling various downstream tasks such as player assessment, in-game analysis, and broadcast production. However, automatically detecting jersey numbers from player tracklets in videos presents challenges due to motion blur, low resolution, distortions, and occlusions. Existing methods, utilizing Spatial Transformer Networks, CNNs, and Vision Transformers, have shown success in image data but struggle with real-world video data, where jersey numbers are not visible in most of the frames. Hence, identifying frames that contain the jersey number is a key sub-problem to tackle. To address these issues, we propose a robust keyframe identification module that extracts frames containing essential high-level information about the jersey number. A spatio-temporal network is then employed to model spatial and temporal context and predict the probabilities of jersey numbers in the video. Additionally, we adopt a multi-task loss function to predict the probability distribution of each digit separately. Extensive evaluations on the SoccerNet dataset demonstrate that incorporating our proposed keyframe identification module results in a significant 37.81% and 37.70% increase in the accuracies of 2 different test sets with domain gaps. These results highlight the effectiveness and importance of our approach in tackling the challenges of automatic jersey number detection in sports videos.

* Accepted in the 6th International Workshop on Multimedia Content Analysis in Sports (MMSports'23) @ ACM Multimedia

Via

Access Paper or Ask Questions

Mitigating Motion Blur for Robust 3D Baseball Player Pose Modeling for Pitch Analysis

Sep 02, 2023

Jerrin Bright, Yuhao Chen, John Zelek

Abstract:Using videos to analyze pitchers in baseball can play a vital role in strategizing and injury prevention. Computer vision-based pose analysis offers a time-efficient and cost-effective approach. However, the use of accessible broadcast videos, with a 30fps framerate, often results in partial body motion blur during fast actions, limiting the performance of existing pose keypoint estimation models. Previous works have primarily relied on fixed backgrounds, assuming minimal motion differences between frames, or utilized multiview data to address this problem. To this end, we propose a synthetic data augmentation pipeline to enhance the model's capability to deal with the pitcher's blurry actions. In addition, we leverage in-the-wild videos to make our model robust under different real-world conditions and camera positions. By carefully optimizing the augmentation parameters, we observed a notable reduction in the loss by 54.2% and 36.2% on the test dataset for 2D and 3D pose estimation respectively. By applying our approach to existing state-of-the-art pose estimators, we demonstrate an average improvement of 29.2%. The findings highlight the effectiveness of our method in mitigating the challenges posed by motion blur, thereby enhancing the overall quality of pose estimation.

* Accepted in the 6th International Workshop on Multimedia Content Analysis in Sports (MMSports'23) @ ACM Multimedia

Via

Access Paper or Ask Questions

ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Mar 31, 2022

Jerrin Bright, Suryaprakash Rajkumar, Arockia Selvakumar Arockia Doss

Figure 1 for ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Figure 2 for ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Figure 3 for ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Figure 4 for ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Abstract:Convolutional Neural Networks need the construction of informative features, which are determined by channel-wise and spatial-wise information at the network's layers. In this research, we focus on bringing in a novel solution that uses sophisticated optimization for enhancing both the spatial and channel components inside each layer's receptive field. Capsule Networks were used to understand the spatial association between features in the feature map. Standalone capsule networks have shown good results on comparatively simple datasets than on complex datasets as a result of the inordinate amount of feature information. Thus, to tackle this issue, we have proposed ME-CapsNet by introducing deeper convolutional layers to extract important features before passing through modules of capsule layers strategically to improve the performance of the network significantly. The deeper convolutional layer includes blocks of Squeeze-Excitation networks which use a stochastic sampling approach for progressively reducing the spatial size thereby dynamically recalibrating the channels by reconstructing their interdependencies without much loss of important feature information. Extensive experimentation was done using commonly used datasets demonstrating the efficiency of the proposed ME-CapsNet, which clearly outperforms various research works by achieving higher accuracy with minimal model complexity in complex datasets.

* Submitted to 8th IEEE International Conference on Electronics, Computing and Communication Technologies (IEEE CONECCT'22)

Via

Access Paper or Ask Questions