Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chankyu Lee

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Jun 16, 2025

Zihan Liu, Zhuolin Yang, Yang Chen, Chankyu Lee, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Abstract:In this work, we investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models. We begin by curating the SFT training data through two scaling strategies: increasing the number of collected prompts and the number of generated responses per prompt. Both approaches yield notable improvements in reasoning performance, with scaling the number of prompts resulting in more substantial gains. We then explore the following questions regarding the synergy between SFT and RL: (i) Does a stronger SFT model consistently lead to better final performance after large-scale RL training? (ii) How can we determine an appropriate sampling temperature during RL training to effectively balance exploration and exploitation for a given SFT initialization? Our findings suggest that (i) holds true, provided effective RL training is conducted, particularly when the sampling temperature is carefully chosen to maintain the temperature-adjusted entropy around 0.3, a setting that strikes a good balance between exploration and exploitation. Notably, the performance gap between initial SFT models narrows significantly throughout the RL process. Leveraging a strong SFT foundation and insights into the synergistic interplay between SFT and RL, our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and achieves new state-of-the-art performance among Qwen2.5-7B-based reasoning models on challenging math and code benchmarks, thereby demonstrating the effectiveness of our post-training recipe. We release the model and data at: https://huggingface.co/nvidia/AceReason-Nemotron-1.1-7B

* The AceReason-Nemotron collection: https://huggingface.co/collections/nvidia/acereason-682f4e1261dc22f697fd1485

Via

Access Paper or Ask Questions

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning

May 22, 2025

Yang Chen, Zhuolin Yang, Zihan Liu, Chankyu Lee, Peng Xu, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Abstract:Despite recent progress in large-scale reinforcement learning (RL) for reasoning, the training recipe for building high-performing reasoning models remains elusive. Key implementation details of frontier models, such as DeepSeek-R1, including data curation strategies and RL training recipe, are often omitted. Moreover, recent research indicates distillation remains more effective than RL for smaller models. In this work, we demonstrate that large-scale RL can significantly enhance the reasoning capabilities of strong, small- and mid-sized models, achieving results that surpass those of state-of-the-art distillation-based models. We systematically study the RL training process through extensive ablations and propose a simple yet effective approach: first training on math-only prompts, then on code-only prompts. Notably, we find that math-only RL not only significantly enhances the performance of strong distilled models on math benchmarks (e.g., +14.6% / +17.2% on AIME 2025 for the 7B / 14B models), but also code reasoning tasks (e.g., +6.8% / +5.8% on LiveCodeBench for the 7B / 14B models). In addition, extended code-only RL iterations further improve performance on code benchmarks with minimal or no degradation in math results. We develop a robust data curation pipeline to collect challenging prompts with high-quality, verifiable answers and test cases to enable verification-based RL across both domains. Finally, we identify key experimental insights, including curriculum learning with progressively increasing response lengths and the stabilizing effect of on-policy parameter updates. We find that RL not only elicits the foundational reasoning capabilities acquired during pretraining and supervised fine-tuning (e.g., distillation), but also pushes the limits of the model's reasoning ability, enabling it to solve problems that were previously unsolvable.

* We release the model at: https://huggingface.co/nvidia/AceReason-Nemotron-14B

Via

Access Paper or Ask Questions

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

Nov 04, 2024

Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi, Jimmy Lin, Bryan Catanzaro, Wei Ping

Abstract:State-of-the-art retrieval models typically address a straightforward search scenario, where retrieval tasks are fixed (e.g., finding a passage to answer a specific question) and only a single modality is supported for both queries and retrieved results. This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. To this end, we first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks. Our empirical results show that the fine-tuned MLLM retriever is capable of understanding challenging queries, composed of both text and image, but underperforms a smaller CLIP retriever in cross-modal retrieval tasks due to modality bias from MLLMs. To address the issue, we propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers. Second, we propose to continually fine-tune the universal multimodal retriever to enhance its text retrieval capability while maintaining multimodal retrieval capability. As a result, our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR, which spans multiple domains and tasks, while also surpassing the state-of-the-art text retrieval model, NV-Embed-v1, on MTEB retrieval benchmark. Finally, we explore to prompt the off-the-shelf MLLMs as the zero-shot rerankers to refine the ranking of the candidates from the multimodal retriever. We find that through prompt-and-reranking, MLLMs can further improve multimodal retrieval when the user queries (e.g., text-image composed queries) are more complex and challenging to understand. These findings also pave the way to advance universal multimodal retrieval in the future.

* We release the model weights at: https://huggingface.co/nvidia/MM-Embed

Via

Access Paper or Ask Questions

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

May 27, 2024

Chankyu Lee, Rajarshi Roy, Mengyao Xu, Jonathan Raiman, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping

Figure 1 for NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Figure 2 for NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Figure 3 for NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Figure 4 for NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Abstract:Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. For model architecture, we propose a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last <EOS> token embedding from LLMs. To enhance representation learning, we remove the causal attention mask of LLMs during contrastive training. For model training, we introduce a two-stage contrastive instruction-tuning method. It first applies contrastive training with instructions on retrieval datasets, utilizing in-batch negatives and curated hard negative examples. At stage-2, it blends various non-retrieval datasets into instruction tuning, which not only enhances non-retrieval task accuracy but also improves retrieval performance. Combining these techniques, our NV-Embed model, using only publicly available data, has achieved a record-high score of 69.32, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) (as of May 24, 2024), with 56 tasks, encompassing retrieval, reranking, classification, clustering, and semantic textual similarity tasks. Notably, our model also attains the highest score of 59.36 on 15 retrieval tasks in the MTEB benchmark (also known as BEIR). We will open-source the model at: https://huggingface.co/nvidia/NV-Embed-v1.

Via

Access Paper or Ask Questions

ChatQA: Building GPT-4 Level Conversational QA Models

Jan 23, 2024

Zihan Liu, Wei Ping, Rajarshi Roy, Peng Xu, Chankyu Lee, Mohammad Shoeybi, Bryan Catanzaro

Figure 1 for ChatQA: Building GPT-4 Level Conversational QA Models

Figure 2 for ChatQA: Building GPT-4 Level Conversational QA Models

Figure 3 for ChatQA: Building GPT-4 Level Conversational QA Models

Figure 4 for ChatQA: Building GPT-4 Level Conversational QA Models

Abstract:In this work, we introduce ChatQA, a family of conversational question answering (QA) models that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval-augmented generation in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

* We added ChatQA-22B results

Via

Access Paper or Ask Questions

Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures

Mar 19, 2021

Chankyu Lee, Adarsh Kumar Kosta, Kaushik Roy

Figure 1 for Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures

Figure 2 for Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures

Figure 3 for Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures

Figure 4 for Fusion-FlowNet: Energy-Efficient Optical Flow Estimation using Sensor Fusion and Deep Fused Spiking-Analog Network Architectures

Abstract:Standard frame-based cameras that sample light intensity frames are heavily impacted by motion blur for high-speed motion and fail to perceive scene accurately when the dynamic range is high. Event-based cameras, on the other hand, overcome these limitations by asynchronously detecting the variation in individual pixel intensities. However, event cameras only provide information about pixels in motion, leading to sparse data. Hence, estimating the overall dense behavior of pixels is difficult. To address such issues associated with the sensors, we present Fusion-FlowNet, a sensor fusion framework for energy-efficient optical flow estimation using both frame- and event-based sensors, leveraging their complementary characteristics. Our proposed network architecture is also a fusion of Spiking Neural Networks (SNNs) and Analog Neural Networks (ANNs) where each network is designed to simultaneously process asynchronous event streams and regular frame-based images, respectively. Our network is end-to-end trained using unsupervised learning to avoid expensive video annotations. The method generalizes well across distinct environments (rapid motion and challenging lighting conditions) and demonstrates state-of-the-art optical flow prediction on the Multi-Vehicle Stereo Event Camera (MVSEC) dataset. Furthermore, our network offers substantial savings in terms of the number of network parameters and computational energy cost.

Via

Access Paper or Ask Questions

Towards Understanding the Effect of Leak in Spiking Neural Networks

Jun 15, 2020

Sayeed Shafayet Chowdhury, Chankyu Lee, Kaushik Roy

Figure 1 for Towards Understanding the Effect of Leak in Spiking Neural Networks

Figure 2 for Towards Understanding the Effect of Leak in Spiking Neural Networks

Figure 3 for Towards Understanding the Effect of Leak in Spiking Neural Networks

Figure 4 for Towards Understanding the Effect of Leak in Spiking Neural Networks

Abstract:Spiking Neural Networks (SNNs) are being explored to emulate the astounding capabilities of human brain that can learn and compute functions robustly and efficiently with noisy spiking activities. A variety of spiking neuron models have been proposed to resemble biological neuronal functionalities. With varying levels of bio-fidelity, these models often contain a leak path in their internal states, called membrane potentials. While the leaky models have been argued as more bioplausible, a comparative analysis between models with and without leak from a purely computational point of view demands attention. In this paper, we investigate the questions regarding the justification of leak and the pros and cons of using leaky behavior. Our experimental results reveal that leaky neuron model provides improved robustness and better generalization compared to models with no leak. However, leak decreases the sparsity of computation contrary to the common notion. Through a frequency domain analysis, we demonstrate the effect of leak in eliminating the high-frequency components from the input, thus enabling SNNs to be more robust against noisy spike-inputs.

* Sayeed Shafayet Chowdhury and Chankyu Lee contributed equally

Via

Access Paper or Ask Questions

Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

Mar 14, 2020

Chankyu Lee, Adarsh Kosta, Alex Zihao Zhu, Kenneth Chaney, Kostas Daniilidis, Kaushik Roy

Figure 1 for Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

Figure 2 for Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

Figure 3 for Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

Figure 4 for Spike-FlowNet: Event-based Optical Flow Estimation with Energy-Efficient Hybrid Neural Networks

Abstract:Event-based cameras display great potential for a variety of conditions such as high-speed motion detection and enabling navigation in low-light environments where conventional frame-based cameras suffer critically. This is attributed to their high temporal resolution, high dynamic range, and low-power consumption. However, conventional computer vision methods as well as deep Analog Neural Networks (ANNs) are not suited to work well with the asynchronous and discrete nature of event camera outputs. Spiking Neural Networks (SNNs) serve as ideal paradigms to handle event camera outputs, but deep SNNs suffer in terms of performance due to spike vanishing phenomenon. To overcome these issues, we present Spike-FlowNet, a deep hybrid neural network architecture integrating SNNs and ANNs for efficiently estimating optical flow from sparse event camera outputs without sacrificing the performance. The network is end-to-end trained with self-supervised learning on Multi-Vehicle Stereo Event Camera (MVSEC) dataset. Spike-FlowNet outperforms its corresponding ANN-based method in terms of the optical flow prediction capability while providing significant computational efficiency.

Via

Access Paper or Ask Questions

A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

May 07, 2019

Saima Sharmin, Priyadarshini Panda, Syed Shakib Sarwar, Chankyu Lee, Wachirawit Ponghiran, Kaushik Roy

Figure 1 for A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

Figure 2 for A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

Figure 3 for A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

Figure 4 for A Comprehensive Analysis on Adversarial Robustness of Spiking Neural Networks

Abstract:In this era of machine learning models, their functionality is being threatened by adversarial attacks. In the face of this struggle for making artificial neural networks robust, finding a model, resilient to these attacks, is very important. In this work, we present, for the first time, a comprehensive analysis of the behavior of more bio-plausible networks, namely Spiking Neural Network (SNN) under state-of-the-art adversarial tests. We perform a comparative study of the accuracy degradation between conventional VGG-9 Artificial Neural Network (ANN) and equivalent spiking network with CIFAR-10 dataset in both whitebox and blackbox setting for different types of single-step and multi-step FGSM (Fast Gradient Sign Method) attacks. We demonstrate that SNNs tend to show more resiliency compared to ANN under black-box attack scenario. Additionally, we find that SNN robustness is largely dependent on the corresponding training mechanism. We observe that SNNs trained by spike-based backpropagation are more adversarially robust than the ones obtained by ANN-to-SNN conversion rules in several whitebox and blackbox scenarios. Finally, we also propose a simple, yet, effective framework for crafting adversarial attacks from SNNs. Our results suggest that attacks crafted from SNNs following our proposed method are much stronger than those crafted from ANNs.

* Accepted in IJCNN2019

Via

Access Paper or Ask Questions

Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

Mar 25, 2019

Chankyu Lee, Syed Shakib Sarwar, Kaushik Roy

Figure 1 for Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

Figure 2 for Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

Figure 3 for Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

Figure 4 for Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

Abstract:Spiking Neural Networks (SNNs) has recently emerged as a prominent neural computing paradigm. However, the typical shallow spiking network architectures have limited capacity for expressing complex representations, while training a very deep spiking network have not been successful so far. Diverse methods have been proposed to get around this issue such as converting off-line trained deep Artificial Neural Networks (ANNs) to SNNs. However, ANN-to-SNN conversion scheme fails to capture the temporal dynamics of a spiking system. On the other hand, it is still a difficult problem to directly train deep SNNs using input spike events due to the discontinuous and non-differentiable nature of the spike signals. To overcome this problem, we propose using differentiable (but approximate) activation for Leaky Integrate-and-Fire (LIF) spiking neurons to train deep convolutional SNNs with input spike events using spike-based backpropagation algorithm. Our experiments show the effectiveness of the proposed spike-based learning strategy on state-of-the-art deep networks (VGG and Residual architectures) by achieving the best classification accuracies in MNIST, SVHN and CIFAR-10 datasets compared to other SNNs trained with spike-based learning. Moreover, we analyze sparse event-driven computations to demonstrate the efficacy of proposed SNN training method for inference operation in the spiking domain.

* Chankyu Lee and Syed Shakib Sarwar contributed equally to the work

Via

Access Paper or Ask Questions