Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sungjun Lee

Short-Window Sliding Learning for Real-Time Violence Detection via LLM-based Auto-Labeling

Nov 14, 2025

Seoik Jung, Taekyung Song, Yangro Lee, Sungjun Lee

Abstract:This paper proposes a Short-Window Sliding Learning framework for real-time violence detection in CCTV footages. Unlike conventional long-video training approaches, the proposed method divides videos into 1-2 second clips and applies Large Language Model (LLM)-based auto-caption labeling to construct fine-grained datasets. Each short clip fully utilizes all frames to preserve temporal continuity, enabling precise recognition of rapid violent events. Experiments demonstrate that the proposed method achieves 95.25\% accuracy on RWF-2000 and significantly improves performance on long videos (UCF-Crime: 83.25\%), confirming its strong generalization and real-time applicability in intelligent surveillance systems.

* 5 pages, 2 figures. Accepted paper for the IEIE (Institute of Electronics and Information Engineers) Fall Conference 2025. Presentation on Nov 27, 2025

Via

Access Paper or Ask Questions

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Dec 27, 2022

Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

Figure 1 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Figure 2 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Figure 3 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Figure 4 for Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

Abstract:Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that are aligned at different levels, the inherent noises (e.g., misaligned pairs) make it difficult to learn a precise captioning model. While the filtering strategy can effectively remove noisy data, however, it leads to a decrease in learnable knowledge and sometimes brings about a new problem of data deficiency. To take the best of both worlds, we propose a noise-aware learning framework, which learns rich knowledge from the whole web-crawled data while being less affected by the noises. This is achieved by the proposed quality controllable model, which is learned using alignment levels of the image-text pairs as an additional control signal during training. The alignment-conditioned training allows the model to generate high-quality captions of well-aligned by simply setting the control signal to desired alignment level at inference time. Through in-depth analysis, we show that our controllable captioning model is effective in handling noise. In addition, with two tasks of zero-shot captioning and text-to-image retrieval using generated captions (i.e., self-retrieval), we also demonstrate our model can produce high-quality captions in terms of descriptiveness and distinctiveness. Code is available at \url{https://github.com/kakaobrain/noc}.

Via

Access Paper or Ask Questions