Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shaona Ghosh

Surfacing Semantic Orthogonality Across Model Safety Benchmarks: A Multi-Dimensional Analysis

May 23, 2025

Jonathan Bennion, Shaona Ghosh, Mantek Singh, Nouha Dziri

Abstract:Various AI safety datasets have been developed to measure LLMs against evolving interpretations of harm. Our evaluation of five recently published open-source safety benchmarks reveals distinct semantic clusters using UMAP dimensionality reduction and kmeans clustering (silhouette score: 0.470). We identify six primary harm categories with varying benchmark representation. GretelAI, for example, focuses heavily on privacy concerns, while WildGuardMix emphasizes self-harm scenarios. Significant differences in prompt length distribution suggests confounds to data collection and interpretations of harm as well as offer possible context. Our analysis quantifies benchmark orthogonality among AI benchmarks, allowing for transparency in coverage gaps despite topical similarities. Our quantitative framework for analyzing semantic orthogonality across safety benchmarks enables more targeted development of datasets that comprehensively address the evolving landscape of harms in AI use, however that is defined in the future.

* Computer Science & Information Technology 15 (2025) 27 - 39
* 6th International Conference on Advanced Natural Language Processing (AdNLP 2025), May 17 ~ 18, 2025, Zurich, Switzerland

Via

Access Paper or Ask Questions

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Apr 10, 2025

NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi(+191 more)

Abstract:As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3$\times$ faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. All Nemotron-H models will be released, with support in Hugging Face, NeMo, and Megatron-LM.

Via

Access Paper or Ask Questions

Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Jan 15, 2025

Shaona Ghosh, Prasoon Varshney, Makesh Narsimhan Sreedhar, Aishwarya Padmakumar, Traian Rebedea, Jibin Rajan Varghese, Christopher Parisien

Figure 1 for Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Figure 2 for Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Figure 3 for Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Figure 4 for Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails

Abstract:As Large Language Models (LLMs) and generative AI become increasingly widespread, concerns about content safety have grown in parallel. Currently, there is a clear lack of high-quality, human-annotated datasets that address the full spectrum of LLM-related safety risks and are usable for commercial applications. To bridge this gap, we propose a comprehensive and adaptable taxonomy for categorizing safety risks, structured into 12 top-level hazard categories with an extension to 9 fine-grained subcategories. This taxonomy is designed to meet the diverse requirements of downstream users, offering more granular and flexible tools for managing various risk types. Using a hybrid data generation pipeline that combines human annotations with a multi-LLM "jury" system to assess the safety of responses, we obtain Aegis 2.0, a carefully curated collection of 34,248 samples of human-LLM interactions, annotated according to our proposed taxonomy. To validate its effectiveness, we demonstrate that several lightweight models, trained using parameter-efficient techniques on Aegis 2.0, achieve performance competitive with leading safety models fully fine-tuned on much larger, non-commercial datasets. In addition, we introduce a novel training blend that combines safety with topic following data.This approach enhances the adaptability of guard models, enabling them to generalize to new risk categories defined during inference. We plan to open-source Aegis 2.0 data and models to the research community to aid in the safety guardrailing of LLMs.

* arXiv admin note: text overlap with arXiv:2404.05993

Via

Access Paper or Ask Questions

Towards Inference-time Category-wise Safety Steering for Large Language Models

Oct 02, 2024

Amrita Bhattacharjee, Shaona Ghosh, Traian Rebedea, Christopher Parisien

Figure 1 for Towards Inference-time Category-wise Safety Steering for Large Language Models

Figure 2 for Towards Inference-time Category-wise Safety Steering for Large Language Models

Figure 3 for Towards Inference-time Category-wise Safety Steering for Large Language Models

Figure 4 for Towards Inference-time Category-wise Safety Steering for Large Language Models

Abstract:While large language models (LLMs) have seen unprecedented advancements in capabilities and applications across a variety of use-cases, safety alignment of these models is still an area of active research. The fragile nature of LLMs, even models that have undergone extensive alignment and safety training regimes, warrants additional safety steering steps via training-free, inference-time methods. While recent work in the area of mechanistic interpretability has investigated how activations in latent representation spaces may encode concepts, and thereafter performed representation engineering to induce such concepts in LLM outputs, the applicability of such for safety is relatively under-explored. Unlike recent inference-time safety steering works, in this paper we explore safety steering of LLM outputs using: (i) category-specific steering vectors, thereby enabling fine-grained control over the steering, and (ii) sophisticated methods for extracting informative steering vectors for more effective safety steering while retaining quality of the generated text. We demonstrate our exploration on multiple LLMs and datasets, and showcase the effectiveness of the proposed steering method, along with a discussion on the implications and best practices.

Via

Access Paper or Ask Questions

Nemotron-4 340B Technical Report

Jun 17, 2024

Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro(+73 more)

Figure 1 for Nemotron-4 340B Technical Report

Figure 2 for Nemotron-4 340B Technical Report

Figure 3 for Nemotron-4 340B Technical Report

Figure 4 for Nemotron-4 340B Technical Report

Abstract:We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward. Our models are open access under the NVIDIA Open Model License Agreement, a permissive model license that allows distribution, modification, and use of the models and its outputs. These models perform competitively to open access models on a wide range of evaluation benchmarks, and were sized to fit on a single DGX H100 with 8 GPUs when deployed in FP8 precision. We believe that the community can benefit from these models in various research studies and commercial applications, especially for generating synthetic data to train smaller language models. Notably, over 98% of data used in our model alignment process is synthetically generated, showcasing the effectiveness of these models in generating synthetic data. To further support open research and facilitate model development, we are also open-sourcing the synthetic data generation pipeline used in our model alignment process.

Via

Access Paper or Ask Questions

AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Apr 09, 2024

Shaona Ghosh, Prasoon Varshney, Erick Galinkin, Christopher Parisien

Figure 1 for AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Figure 2 for AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Figure 3 for AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Figure 4 for AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts

Abstract:As Large Language Models (LLMs) and generative AI become more widespread, the content safety risks associated with their use also increase. We find a notable deficiency in high-quality content safety datasets and benchmarks that comprehensively cover a wide range of critical safety areas. To address this, we define a broad content safety risk taxonomy, comprising 13 critical risk and 9 sparse risk categories. Additionally, we curate AEGISSAFETYDATASET, a new dataset of approximately 26, 000 human-LLM interaction instances, complete with human annotations adhering to the taxonomy. We plan to release this dataset to the community to further research and to help benchmark LLM models for safety. To demonstrate the effectiveness of the dataset, we instruction-tune multiple LLM-based safety models. We show that our models (named AEGISSAFETYEXPERTS), not only surpass or perform competitively with the state-of-the-art LLM-based safety models and general purpose LLMs, but also exhibit robustness across multiple jail-break attack categories. We also show how using AEGISSAFETYDATASET during the LLM alignment phase does not negatively impact the performance of the aligned models on MT Bench scores. Furthermore, we propose AEGIS, a novel application of a no-regret online adaptation framework with strong theoretical guarantees, to perform content moderation with an ensemble of LLM content safety experts in deployment

Via

Access Paper or Ask Questions

CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues

Apr 04, 2024

Makesh Narsimhan Sreedhar, Traian Rebedea, Shaona Ghosh, Christopher Parisien

Abstract:Recent advancements in instruction-tuning datasets have predominantly focused on specific tasks like mathematical or logical reasoning. There has been a notable gap in data designed for aligning language models to maintain topic relevance in conversations - a critical aspect for deploying chatbots to production. We introduce the CantTalkAboutThis dataset to help language models remain focused on the subject at hand during task-oriented interactions. It consists of synthetic dialogues on a wide range of conversation topics from different domains. These dialogues are interspersed with distractor turns that intentionally divert the chatbot from the predefined topic. Fine-tuning language models on this dataset helps make them resilient to deviating from the role assigned and improves their ability to maintain topical coherence compared to general-purpose instruction-tuned LLMs like GPT-4-turbo and Mixtral-Instruct. Additionally, preliminary observations suggest that training models on this dataset also enhance their performance on fine-grained instruction following tasks.

Via

Access Paper or Ask Questions

Joint Learning of Portrait Intrinsic Decomposition and Relighting

Jun 22, 2021

Mona Zehni, Shaona Ghosh, Krishna Sridhar, Sethu Raman

Figure 1 for Joint Learning of Portrait Intrinsic Decomposition and Relighting

Figure 2 for Joint Learning of Portrait Intrinsic Decomposition and Relighting

Figure 3 for Joint Learning of Portrait Intrinsic Decomposition and Relighting

Figure 4 for Joint Learning of Portrait Intrinsic Decomposition and Relighting

Abstract:Inverse rendering is the problem of decomposing an image into its intrinsic components, i.e. albedo, normal and lighting. To solve this ill-posed problem from single image, state-of-the-art methods in shape from shading mostly resort to supervised training on all the components on either synthetic or real datasets. Here, we propose a new self-supervised training paradigm that 1) reduces the need for full supervision on the decomposition task and 2) takes into account the relighting task. We introduce new self-supervised loss terms that leverage the consistencies between multi-lit images (images of the same scene under different illuminations). Our approach is applicable to multi-lit datasets. We apply our training approach in two settings: 1) train on a mixture of synthetic and real data, 2) train on real datasets with limited supervision. We show-case the effectiveness of our training paradigm on both intrinsic decomposition and relighting and demonstrate how the model struggles in both tasks without the self-supervised loss terms in limited supervision settings. We provide results of comprehensive experiments on SfSNet, CelebA and Photoface datasets and verify the performance of our approach on images in the wild.

Via

Access Paper or Ask Questions

Neural Networks for Text Correction and Completion in Keyboard Decoding

Sep 19, 2017

Shaona Ghosh, Per Ola Kristensson

Figure 1 for Neural Networks for Text Correction and Completion in Keyboard Decoding

Figure 2 for Neural Networks for Text Correction and Completion in Keyboard Decoding

Figure 3 for Neural Networks for Text Correction and Completion in Keyboard Decoding

Figure 4 for Neural Networks for Text Correction and Completion in Keyboard Decoding

Abstract:Despite the ubiquity of mobile and wearable text messaging applications, the problem of keyboard text decoding is not tackled sufficiently in the light of the enormous success of the deep learning Recurrent Neural Network (RNN) and Convolutional Neural Networks (CNN) for natural language understanding. In particular, considering that the keyboard decoders should operate on devices with memory and processor resource constraints, makes it challenging to deploy industrial scale deep neural network (DNN) models. This paper proposes a sequence-to-sequence neural attention network system for automatic text correction and completion. Given an erroneous sequence, our model encodes character level hidden representations and then decodes the revised sequence thus enabling auto-correction and completion. We achieve this by a combination of character level CNN and gated recurrent unit (GRU) encoder along with and a word level gated recurrent unit (GRU) attention decoder. Unlike traditional language models that learn from billions of words, our corpus size is only 12 million words; an order of magnitude smaller. The memory footprint of our learnt model for inference and prediction is also an order of magnitude smaller than the conventional language model based text decoders. We report baseline performance for neural keyboard decoders in such limited domain. Our models achieve a word level accuracy of $90\%$ and a character error rate CER of $2.4\%$ over the Twitter typo dataset. We present a novel dataset of noisy to corrected mappings by inducing the noise distribution from the Twitter data over the OpenSubtitles 2009 dataset; on which our model predicts with a word level accuracy of $98\%$ and sequence accuracy of $68.9\%$. In our user study, our model achieved an average CER of $2.6\%$ with the state-of-the-art non-neural touch-screen keyboard decoder at CER of $1.6\%$.

Via

Access Paper or Ask Questions

An Application of Network Lasso Optimization For Ride Sharing Prediction

Jul 06, 2016

Shaona Ghosh, Kevin Page, David De Roure

Figure 1 for An Application of Network Lasso Optimization For Ride Sharing Prediction

Figure 2 for An Application of Network Lasso Optimization For Ride Sharing Prediction

Figure 3 for An Application of Network Lasso Optimization For Ride Sharing Prediction

Figure 4 for An Application of Network Lasso Optimization For Ride Sharing Prediction

Abstract:Ride sharing has important implications in terms of environmental, social and individual goals by reducing carbon footprints, fostering social interactions and economizing commuter costs. The ride sharing systems that are commonly available lack adaptive and scalable techniques that can simultaneously learn from the large scale data and predict in real-time dynamic fashion. In this paper, we study such a problem towards a smart city initiative, where a generic ride sharing system is conceived capable of making predictions about ride share opportunities based on the historically recorded data while satisfying real-time ride requests. Underpinning the system is an application of a powerful machine learning convex optimization framework called Network Lasso that uses the Alternate Direction Method of Multipliers (ADMM) optimization for learning and dynamic prediction. We propose an application of a robust and scalable unified optimization framework within the ride sharing case-study. The application of Network Lasso framework is capable of jointly optimizing and clustering different rides based on their spatial and model similarity. The prediction from the framework clusters new ride requests, making accurate price prediction based on the clusters, detecting hidden correlations in the data and allowing fast convergence due to the network topology. We provide an empirical evaluation of the application of ADMM network Lasso on real trip record and simulated data, proving their effectiveness since the mean squared error of the algorithm's prediction is minimized on the test rides.

Via

Access Paper or Ask Questions