Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vinh Nguyen

Llama-Nemotron: Efficient Reasoning Models

May 02, 2025

Akhiad Bercovich, Itay Levy, Izik Golan, Mohammad Dabbah, Ran El-Yaniv, Omri Puny, Ido Galil, Zach Moshe, Tomer Ronen, Najeeb Nabwani(+121 more)

Abstract:We introduce the Llama-Nemotron series of models, an open family of heterogeneous reasoning models that deliver exceptional reasoning capabilities, inference efficiency, and an open license for enterprise use. The family comes in three sizes -- Nano (8B), Super (49B), and Ultra (253B) -- and performs competitively with state-of-the-art reasoning models such as DeepSeek-R1 while offering superior inference throughput and memory efficiency. In this report, we discuss the training procedure for these models, which entails using neural architecture search from Llama 3 models for accelerated inference, knowledge distillation, and continued pretraining, followed by a reasoning-focused post-training stage consisting of two main parts: supervised fine-tuning and large scale reinforcement learning. Llama-Nemotron models are the first open-source models to support a dynamic reasoning toggle, allowing users to switch between standard chat and reasoning modes during inference. To further support open research and facilitate model development, we provide the following resources: 1. We release the Llama-Nemotron reasoning models -- LN-Nano, LN-Super, and LN-Ultra -- under the commercially permissive NVIDIA Open Model License Agreement. 2. We release the complete post-training dataset: Llama-Nemotron-Post-Training-Dataset. 3. We also release our training codebases: NeMo, NeMo-Aligner, and Megatron-LM.

Via

Access Paper or Ask Questions

Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Nov 13, 2024

Nazanin Mahjourian, Vinh Nguyen

Figure 1 for Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Figure 2 for Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Figure 3 for Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Figure 4 for Multimodal Object Detection using Depth and Image Data for Manufacturing Parts

Abstract:Manufacturing requires reliable object detection methods for precise picking and handling of diverse types of manufacturing parts and components. Traditional object detection methods utilize either only 2D images from cameras or 3D data from lidars or similar 3D sensors. However, each of these sensors have weaknesses and limitations. Cameras do not have depth perception and 3D sensors typically do not carry color information. These weaknesses can undermine the reliability and robustness of industrial manufacturing systems. To address these challenges, this work proposes a multi-sensor system combining an red-green-blue (RGB) camera and a 3D point cloud sensor. The two sensors are calibrated for precise alignment of the multimodal data captured from the two hardware devices. A novel multimodal object detection method is developed to process both RGB and depth data. This object detector is based on the Faster R-CNN baseline that was originally designed to process only camera images. The results show that the multimodal model significantly outperforms the depth-only and RGB-only baselines on established object detection metrics. More specifically, the multimodal model improves mAP by 13% and raises Mean Precision by 11.8% in comparison to the RGB-only baseline. Compared to the depth-only baseline, it improves mAP by 78% and raises Mean Precision by 57%. Hence, this method facilitates more reliable and robust object detection in service to smart manufacturing applications.

Via

Access Paper or Ask Questions

PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

Oct 22, 2024

Vinh Nguyen

Figure 1 for PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

Figure 2 for PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

Figure 3 for PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

Figure 4 for PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding

Abstract:Generating detailed descriptions from multiple cameras and viewpoints is challenging due to the complex and inconsistent nature of visual data. In this paper, we introduce PerspectiveNet, a lightweight yet efficient model for generating long descriptions across multiple camera views. Our approach utilizes a vision encoder, a compact connector module to convert visual features into a fixed-size tensor, and large language models (LLMs) to harness the strong natural language generation capabilities of LLMs. The connector module is designed with three main goals: mapping visual features onto LLM embeddings, emphasizing key information needed for description generation, and producing a fixed-size feature matrix. Additionally, we augment our solution with a secondary task, the correct frame sequence detection, enabling the model to search for the correct sequence of frames to generate descriptions. Finally, we integrate the connector module, the secondary task, the LLM, and a visual feature extraction model into a single architecture, which is trained for the Traffic Safety Description and Analysis task. This task requires generating detailed, fine-grained descriptions of events from multiple cameras and viewpoints. The resulting model is lightweight, ensuring efficient training and inference, while remaining highly effective.

* 6 pages, 2 figures

Via

Access Paper or Ask Questions

Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Nov 25, 2023

Bernal Jimenez Gutierrez, Yuqing Mao, Vinh Nguyen, Kin Wah Fung, Yu Su, Olivier Bodenreider

Figure 1 for Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Figure 2 for Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Figure 3 for Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Figure 4 for Solving the Right Problem is Key for Translational NLP: A Case Study in UMLS Vocabulary Insertion

Abstract:As the immense opportunities enabled by large language models become more apparent, NLP systems will be increasingly expected to excel in real-world settings. However, in many instances, powerful models alone will not yield translational NLP solutions, especially if the formulated problem is not well aligned with the real-world task. In this work, we study the case of UMLS vocabulary insertion, an important real-world task in which hundreds of thousands of new terms, referred to as atoms, are added to the UMLS, one of the most comprehensive open-source biomedical knowledge bases. Previous work aimed to develop an automated NLP system to make this time-consuming, costly, and error-prone task more efficient. Nevertheless, practical progress in this direction has been difficult to achieve due to a problem formulation and evaluation gap between research output and the real-world task. In order to address this gap, we introduce a new formulation for UMLS vocabulary insertion which mirrors the real-world task, datasets which faithfully represent it and several strong baselines we developed through re-purposing existing solutions. Additionally, we propose an effective rule-enhanced biomedical language model which enables important new model behavior, outperforms all strong baselines and provides measurable qualitative improvements to editors who carry out the UVI task. We hope this case study provides insight into the considerable importance of problem formulation for the success of translational NLP solutions.

* EMNLP 2023 Findings; Code is available at https://github.com/OSU-NLP-Group/UMLS-Vocabulary-Insertion

Via

Access Paper or Ask Questions

p-Laplacian Transformer

Nov 06, 2023

Tuan Nguyen, Tam Nguyen, Vinh Nguyen, Tan M. Nguyen

Abstract:$p$-Laplacian regularization, rooted in graph and image signal processing, introduces a parameter $p$ to control the regularization effect on these data. Smaller values of $p$ promote sparsity and interpretability, while larger values encourage smoother solutions. In this paper, we first show that the self-attention mechanism obtains the minimal Laplacian regularization ($p=2$) and encourages the smoothness in the architecture. However, the smoothness is not suitable for the heterophilic structure of self-attention in transformers where attention weights between tokens that are in close proximity and non-close ones are assigned indistinguishably. From that insight, we then propose a novel class of transformers, namely the $p$-Laplacian Transformer (p-LaT), which leverages $p$-Laplacian regularization framework to harness the heterophilic features within self-attention layers. In particular, low $p$ values will effectively assign higher attention weights to tokens that are in close proximity to the current token being processed. We empirically demonstrate the advantages of p-LaT over the baseline transformers on a wide range of benchmark datasets.

Via

Access Paper or Ask Questions

From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach

Nov 06, 2023

Tuan Nguyen, Tan M. Nguyen, Hirotada Honda, Takashi Sano, Vinh Nguyen, Shugo Nakamura

Abstract:We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of continuous-depth graph neural networks (GNNs) that employs the Kuramoto model to mitigate the over-smoothing phenomenon, in which node features in GNNs become indistinguishable as the number of layers increases. The Kuramoto model captures the synchronization behavior of non-linear coupled oscillators. Under the view of coupled oscillators, we first show the connection between Kuramoto model and basic GNN and then over-smoothing phenomenon in GNNs can be interpreted as phase synchronization in Kuramoto model. The KuramotoGNN replaces this phase synchronization with frequency synchronization to prevent the node features from converging into each other while allowing the system to reach a stable synchronized state. We experimentally verify the advantages of the KuramotoGNN over the baseline GNNs and existing methods in reducing over-smoothing on various graph deep learning benchmark tasks.

Via

Access Paper or Ask Questions

Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature

Nov 28, 2022

Khang Nguyen, Tan Nguyen, Nhat Ho, Khuong Nguyen, Hieu Nong, Vinh Nguyen

Figure 1 for Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature

Figure 2 for Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature

Figure 3 for Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature

Figure 4 for Revisiting Over-smoothing and Over-squashing using Ollivier's Ricci Curvature

Abstract:Graph Neural Networks (GNNs) had been demonstrated to be inherently susceptible to the problems of over-smoothing and over-squashing. These issues prohibit the ability of GNNs to model complex graph interactions by limiting their effectiveness at taking into account distant information. Our study reveals the key connection between the local graph geometry and the occurrence of both of these issues, thereby providing a unified framework for studying them at a local scale using the Ollivier's Ricci curvature. Based on our theory, a number of principled methods are proposed to alleviate the over-smoothing and over-squashing issues.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions

VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Jul 08, 2022

Chuong H. Nguyen, Su Huynh, Vinh Nguyen, Ngoc Nguyen

Figure 1 for VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Figure 2 for VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Figure 3 for VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Figure 4 for VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Abstract:Since being introduced in 2020, Vision Transformers (ViT) has been steadily breaking the record for many vision tasks and are often described as ``all-you-need" to replace ConvNet. Despite that, ViTs are generally computational, memory-consuming, and unfriendly for embedded devices. In addition, recent research shows that standard ConvNet if redesigned and trained appropriately can compete favorably with ViT in terms of accuracy and scalability. In this paper, we adopt the modernized structure of ConvNet to design a new backbone for action recognition. Particularly, our main target is to serve for industrial product deployment, such as FPGA boards in which only standard operations are supported. Therefore, our network simply consists of 2D convolutions, without using any 3D convolution, long-range attention plugin, or Transformer blocks. While being trained with much fewer epochs (5x-10x), our backbone surpasses the methods using (2+1)D and 3D convolution, and achieve comparable results with ViT on two benchmark datasets.

Via

Access Paper or Ask Questions

UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

May 21, 2022

Vinh Nguyen, Olivier Bodenreider

Figure 1 for UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Figure 2 for UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Figure 3 for UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Figure 4 for UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

Abstract:The construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus is time-consuming, costly, and error-prone as it relies on (1) the lexical and semantic processing for suggesting synonymous terms, and (2) the expertise of UMLS editors for curating the suggestions. For improving the UMLS Metathesaurus construction process, our research group has defined a new task called UVA (UMLS Vocabulary Alignment) and generated a dataset for evaluating the task. Our group has also developed different baselines for this task using logical rules (RBA), and neural networks (LexLM and ConLM). In this paper, we present a set of reusable and reproducible resources including (1) a dataset generator, (2) three datasets generated by using the generator, and (3) three baseline approaches. We describe the UVA dataset generator and its implementation generalized for any given UMLS release. We demonstrate the use of the dataset generator by generating datasets corresponding to three UMLS releases, 2020AA, 2021AA, and 2021AB. We provide three UVA baselines using the three existing approaches (LexLM, ConLM, and RBA). The code, the datasets, and the experiments are publicly available, reusable, and reproducible with any UMLS release (a no-cost license agreement is required for downloading the UMLS).

Via

Access Paper or Ask Questions

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Apr 27, 2022

Thilini Wijesiriwardene, Vinh Nguyen, Goonmeet Bajaj, Hong Yung Yip, Vishesh Javangula, Yuqing Mao, Kin Wah Fung, Srinivasan Parthasarathy, Amit P. Sheth, Olivier Bodenreider

Figure 1 for UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Figure 2 for UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Figure 3 for UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Figure 4 for UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

Abstract:The UMLS Metathesaurus integrates more than 200 biomedical source vocabularies. During the Metathesaurus construction process, synonymous terms are clustered into concepts by human editors, assisted by lexical similarity algorithms. This process is error-prone and time-consuming. Recently, a deep learning model (LexLM) has been developed for the UMLS Vocabulary Alignment (UVA) task. This work introduces UBERT, a BERT-based language model, pretrained on UMLS terms via a supervised Synonymy Prediction (SP) task replacing the original Next Sentence Prediction (NSP) task. The effectiveness of UBERT for UMLS Metathesaurus construction process is evaluated using the UMLS Vocabulary Alignment (UVA) task. We show that UBERT outperforms the LexLM, as well as biomedical BERT-based models. Key to the performance of UBERT are the synonymy prediction task specifically developed for UBERT, the tight alignment of training data to the UVA task, and the similarity of the models used for pretrained UBERT.

Via

Access Paper or Ask Questions