Abstract:Reinforcement learning (RL) is increasingly used to personalize instruction in intelligent tutoring systems, yet the field lacks a formal framework for defining and evaluating pedagogical safety. We introduce a four-layer model of pedagogical safety for educational RL comprising structural, progress, behavioral, and alignment safety and propose the Reward Hacking Severity Index (RHSI) to quantify misalignment between proxy rewards and genuine learning. We evaluate the framework in a controlled simulation of an AI tutoring environment with 120 sessions across four conditions and three learner profiles, totaling 18{,}000 interactions. Results show that an engagement-optimized agent systematically over-selected a high-engagement action with no direct mastery gain, producing strong measured performance but limited learning progress. A multi-objective reward formulation reduced this problem but did not eliminate it, as the agent continued to favor proxy-rewarding behavior in many states. In contrast, a constrained architecture combining prerequisite enforcement and minimum cognitive demand substantially reduced reward hacking, lowering RHSI from 0.317 in the unconstrained multi-objective condition to 0.102. Ablation results further suggest that behavioral safety was the most influential safeguard against repetitive low-value action selection. These findings suggest that reward design alone may be insufficient to ensure pedagogically aligned behavior in educational RL, at least in the simulated environment studied here. More broadly, the paper positions pedagogical safety as an important research problem at the intersection of AI safety and intelligent educational systems.
Abstract:Engagement-optimized adaptive tutoring systems may prioritize short-term behavioral signals over sustained learning outcomes, creating structural incentives for reward hacking in reinforcement learning policies. We formalize this challenge as a constrained Markov decision process (CMDP) with mastery-conditioned feasibility, in which pedagogical safety constraints dynamically restrict admissible actions according to learner mastery and prerequisite structure. We introduce Mastery-Conditioned Constrained Policy Optimization (MC-CPO), a two-timescale primal-dual algorithm that integrates structural action masking with constrained policy optimization. In the tabular regime, we establish feasibility preservation and convergence to stationary feasible points under standard stochastic approximation conditions and derive a safety gap result showing that optimization within the mastery-conditioned feasible set can strictly dominate post-hoc filtering under identical safety budgets. Empirical validation is conducted in minimal and extended tabular environments and in a neural tutoring setting. Across 10 random seeds and one million training steps in the neural regime, MC-CPO satisfies constraint budgets within tolerance, reduces discounted safety costs relative to unconstrained and reward-shaped baselines, and substantially lowers the Reward Hacking Severity Index (RHSI). These results indicate that embedding pedagogical structure directly into the feasible action space provides a principled foundation for mitigating reward hacking in instructional reinforcement learning systems.
Abstract:Dynamic scene reconstruction with Gaussian Splatting has enabled efficient streaming for real-time rendering and free-viewpoint video. However, most pipelines rely on fixed anchor selection such as Farthest Point Sampling (FPS), typically using 8,192 anchors regardless of scene complexity, which over-allocates computation under strict budgets. We propose Efficient Gaussian Streaming (EGS), a plug-in, budget-aware anchor sampler that replaces FPS with a reinforcement-learned policy while keeping the Gaussian streaming reconstruction backbone unchanged. The policy jointly selects an anchor budget and a subset of informative anchors under discrete constraints, balancing reconstruction quality and runtime using spatial features of the Gaussian representation. We evaluate EGS in two settings: fast rendering, which prioritizes runtime efficiency, and high-quality refinement, which enables additional optimization. Experiments on dynamic multi-view datasets show consistent improvements in the quality--efficiency trade-off over FPS sampling. On unseen data, in fast rendering at 256 anchors ($32\times$ fewer than 8,192), EGS improves PSNR by $+0.52$--$0.61$\,dB while running $1.29$--$1.35\times$ faster than IGS@8192 (N3DV and MeetingRoom). In high-quality refinement, EGS remains competitive with the full-anchor baseline at substantially lower anchor budgets. \emph{Code and pretrained checkpoints will be released upon acceptance.} \keywords{4D Gaussian Splatting \and 4D Gaussian Streaming \and Reinforcement Learning}
Abstract:Bitcoin transaction networks are large scale socio- technical systems in which activities are represented through multi-hop interaction patterns. Graph Neural Networks(GNNs) have become a widely adopted tool for analyzing such systems, supporting tasks such as entity detection and transaction classification. Large-scale datasets like Elliptic have allowed for a rise in the analysis of these systems and in tasks such as fraud detection. In these settings, the amount of transactional context available to each node is determined by the neighborhood aggregation and sampling strategies, yet the interaction between these receptive fields and embedding geometry has received limited attention. In this work, we conduct a controlled comparison of Euclidean and tangent-space hyperbolic GNNs for node classification on a large Bitcoin transaction graph. By explicitly varying the neighborhood while keeping the model architecture and dimensionality fixed, we analyze the differences in two embedding spaces. We further examine optimization behavior and observe that joint selection of learning rate and curvature plays a critical role in stabilizing high-dimensional hyperbolic embeddings. Overall, our findings provide practical insights into the role of embedding geometry and neighborhood depth when modeling large-scale transaction networks, informing the deployment of hyperbolic GNNs for computational social systems.
Abstract:Machine learning based network intrusion detection systems are vulnerable to adversarial attacks that degrade classification performance under both gradient-based and distribution shift threat models. Existing defenses typically apply uniform detection strategies, which may not account for heterogeneous attack characteristics. This paper proposes an attack-aware multi-stage defense framework that learns attack-specific detection strategies through a weighted combination of ensemble disagreement, predictive uncertainty, and distributional anomaly signals. Empirical analysis across seven adversarial attack types reveals distinct detection signatures, enabling a two-stage adaptive detection mechanism. Experimental evaluation on a benchmark intrusion detection dataset indicates that the proposed system attains 94.2% area under the receiver operating characteristic curve and improves classification accuracy by 4.5 percentage points and F1-score by 9.0 points over adversarially trained ensembles. Under adaptive white-box attacks with full architectural knowledge, the system appears to maintain 94.4% accuracy with a 4.2% attack success rate, though this evaluation is limited to two adaptive variants and does not constitute a formal robustness guarantee. Cross-dataset validation further suggests that defense effectiveness depends on baseline classifier competence and may vary with feature dimensionality. These results suggest that attack-specific optimization combined with multi-signal integration can provide a practical approach to improving adversarial robustness in machine learning-based intrusion detection systems.
Abstract:Cyberbullying has become a serious and growing concern in todays virtual world. When left unnoticed, it can have adverse consequences for social and mental health. Researchers have explored various types of cyberbullying, but most approaches use single-label classification, assuming that each comment contains only one type of abuse. In reality, a single comment may include overlapping forms such as threats, hate speech, and harassment. Therefore, multilabel detection is both realistic and essential. However, multilabel cyberbullying detection has received limited attention, especially in low-resource languages like Bangla, where robust pre-trained models are scarce. Developing a generalized model with moderate accuracy remains challenging. Transformers offer strong contextual understanding but may miss sequential dependencies, while LSTM models capture temporal flow but lack semantic depth. To address these limitations, we propose a fusion architecture that combines BanglaBERT-Large with a two-layer stacked LSTM. We analyze their behavior to jointly model context and sequence. The model is fine-tuned and evaluated on a publicly available multilabel Bangla cyberbullying dataset covering cyberbully, sexual harassment, threat, and spam. We apply different sampling strategies to address class imbalance. Evaluation uses multiple metrics, including accuracy, precision, recall, F1-score, Hamming loss, Cohens kappa, and AUC-ROC. We employ 5-fold cross-validation to assess the generalization of the architecture.
Abstract:Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse representation. Next, to enhance the dataset's robustness, we incorporate four additional datasets, including public benchmarks, to cover various complexities and improve the model's generalization. To overcome the limitations of current recognition models, we propose a novel hybrid CNN architecture that combines MobileNetV3-Large and EfficientNetB0 for efficient feature extraction. This is followed by an effective multilayer perceptron (MLP) classifier to improve performance while keeping computational costs low, making the system suitable for resource-constrained devices. The experimental results show that the proposed model achieves 97.95% accuracy on controlled datasets, 92.84% on complex backgrounds, and 94.98% accuracy when combining all datasets. The model's performance is thoroughly evaluated using five-fold cross-validation and seven metrics: accuracy, precision, recall, F1-score, Cohen's Kappa, MCC, and AUC. Additionally, explainable AI methods like LIME and SHAP are incorporated to enhance transparency and interpretability.
Abstract:Deep cognitive attention is characterized by heightened gamma oscillations and coordinated visual behavior. Despite the physiological importance of these mechanisms, computational studies rarely synthesize these modalities or identify the neural regions most responsible for sustained focus. To address this gap, this work introduces Gamma2Patterns, a multimodal framework that characterizes deep cognitive attention by leveraging complementary Gamma and Alpha band EEG activity alongside Eye-tracking measurements. Using the SEED-IV dataset [1], we extract spectral power, burst-based temporal dynamics, and fixation-saccade-pupil signals across 62 channels or electrodes to analyze how neural activation differs between high-focus (Gamma-dominant) and low-focus (Alpha-dominant) states. Our findings reveal that frontopolar, temporal, anterior frontal, and parieto-occipital regions exhibit the strongest Gamma power and burst rates, indicating their dominant role in deep attentional engagement, while Eye-tracking signals confirm complementary contributions from frontal, frontopolar, and frontotemporal regions. Furthermore, we show that Gamma power and burst duration provide more discriminative markers of deep focus than Alpha power alone, demonstrating their value for attention decoding. Collectively, these results establish a multimodal, evidence-based map of cortical regions and oscillatory signatures underlying deep focus, providing a neurophysiological foundation for future brain-inspired attention mechanisms in AI systems.
Abstract:Evaluating image captions requires cohesive assessment of both visual semantics and language pragmatics, which is often not entirely captured by most metrics. We introduce Redemption Score, a novel hybrid framework that ranks image captions by triangulating three complementary signals: (1) Mutual Information Divergence (MID) for global image-text distributional alignment, (2) DINO-based perceptual similarity of cycle-generated images for visual grounding, and (3) BERTScore for contextual text similarity against human references. A calibrated fusion of these signals allows Redemption Score to offer a more holistic assessment. On the Flickr8k benchmark, Redemption Score achieves a Kendall-$\tau$ of 56.43, outperforming twelve prior methods and demonstrating superior correlation with human judgments without requiring task-specific training. Our framework provides a more robust and nuanced evaluation by effectively redeeming image semantics and linguistic interpretability indicated by strong transfer of knowledge in the Conceptual Captions and MS COCO datasets.
Abstract:With the rapid advancement of large language models like Gemini, GPT, and others, bridging the gap between the human brain and language processing has become an important area of focus. To address this challenge, researchers have developed various models to decode EEG signals into text. However, these models still face significant performance limitations. To overcome these shortcomings, we propose a new model, R1 Translator, which aims to improve the performance of EEG-to-text decoding. The R1 Translator model combines a bidirectional LSTM encoder with a pretrained transformer-based decoder, utilizing EEG features to produce high-quality text outputs. The model processes EEG embeddings through the LSTM to capture sequential dependencies, which are then fed into the transformer decoder for effective text generation. The R1 Translator excels in ROUGE metrics, outperforming both T5 (previous research) and Brain Translator. Specifically, R1 achieves a ROUGE-1 score of 38.00% (P), which is up to 9% higher than T5 (34.89%) and 3% better than Brain (35.69%). It also leads in ROUGE-L, with a F1 score of 32.51%, outperforming T5 by 3% (29.67%) and Brain by 2% (30.38%). In terms of CER, R1 achieves a CER of 0.5795, which is 2% lower than T5 (0.5917) and 4% lower than Brain (0.6001). Additionally, R1 performs better in WER with a score of 0.7280, outperforming T5 by 4.3% (0.7610) and Brain by 3.6% (0.7553). Code is available at https://github.com/Mmurrad/EEG-To-text.