Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Muhammad Amien Ibrahim

Dual-Class Prompt Generation: Enhancing Indonesian Gender-Based Hate Speech Detection through Data Augmentation

Mar 06, 2025

Muhammad Amien Ibrahim, Faisal, Tora Sangputra Yopie Winarto, Zefanya Delvin Sulistiya

Abstract:Detecting gender-based hate speech in Indonesian social media remains challenging due to limited labeled datasets. While binary hate speech classification has advanced, a more granular category like gender-targeted hate speech is understudied because of class imbalance issues. This paper addresses this gap by comparing three data augmentation techniques for Indonesian gender-based hate speech detection. We evaluate backtranslation, single-class prompt generation (using only hate speech examples), and our proposed dual-class prompt generation (using both hate speech and non-hate speech examples). Experiments show all augmentation methods improve classification performance, with our dual-class approach achieving the best results (88.5% accuracy, 88.1% F1-score using Random Forest). Semantic similarity analysis reveals dual-class prompt generation produces the most novel content, while T-SNE visualizations confirm these samples occupy distinct feature space regions while maintaining class characteristics. Our findings suggest that incorporating examples from both classes helps language models generate more diverse yet representative samples, effectively addressing limited data challenges in specialized hate speech detection.

* Accepted to the 8th World Conference on Computing and Communication Technologies (WCCCT 2025)

Via

Access Paper or Ask Questions

Hybrid Deep Learning for Legal Text Analysis: Predicting Punishment Durations in Indonesian Court Rulings

Oct 26, 2024

Muhammad Amien Ibrahim, Alif Tri Handoyo, Maria Susan Anggreainy

Abstract:Limited public understanding of legal processes and inconsistent verdicts in the Indonesian court system led to widespread dissatisfaction and increased stress on judges. This study addresses these issues by developing a deep learning-based predictive system for court sentence lengths. Our hybrid model, combining CNN and BiLSTM with attention mechanism, achieved an R-squared score of 0.5893, effectively capturing both local patterns and long-term dependencies in legal texts. While document summarization proved ineffective, using only the top 30% most frequent tokens increased prediction performance, suggesting that focusing on core legal terminology balances information retention and computational efficiency. We also implemented a modified text normalization process, addressing common errors like misspellings and incorrectly merged words, which significantly improved the model's performance. These findings have important implications for automating legal document processing, aiding both professionals and the public in understanding court judgments. By leveraging advanced NLP techniques, this research contributes to enhancing transparency and accessibility in the Indonesian legal system, paving the way for more consistent and comprehensible legal decisions.

* 11 pages, 7 figures, 6 tables, submitted to Journal of Advances in Information Technology

Via

Access Paper or Ask Questions