Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fernando Fernandes Neto

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Jun 21, 2024

Shamane Siriwardhana, Mark McQuade, Thomas Gauthier, Lucas Atkins, Fernando Fernandes Neto, Luke Meyers, Anneketh Vij, Tyler Odenthal, Charles Goddard, Mary MacCarthy(+1 more)

Figure 1 for Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Figure 2 for Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Figure 3 for Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Figure 4 for Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation

Abstract:We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: https://huggingface.co/arcee-ai/Llama-3-SEC-Base, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Spectrum: Targeted Training on Signal to Noise Ratio

Jun 07, 2024

Eric Hartford, Lucas Atkins, Fernando Fernandes Neto, David Golchinfar

Abstract:Efficiently post-training large language models remains a challenging task due to the vast computational resources required. We present Spectrum, a method that accelerates LLM training by selectively targeting layer modules based on their signal-to-noise ratio (SNR), and freezing the remaining modules. Our approach, which utilizes an algorithm to compute module SNRs prior to training, has shown to effectively match the performance of full fine-tuning while reducing GPU memory usage. Experiments comparing Spectrum to existing methods such as QLoRA demonstrate its effectiveness in terms of model quality and VRAM efficiency in distributed environments.

Via

Access Paper or Ask Questions

Deep Haar Scattering Networks in Pattern Recognition: A promising approach

Nov 29, 2018

Fernando Fernandes Neto, Alemayehu Admasu Solomon, Rodrigo de Losso, Claudio Garcia, Pedro Delano Cavalcanti

Figure 1 for Deep Haar Scattering Networks in Pattern Recognition: A promising approach

Figure 2 for Deep Haar Scattering Networks in Pattern Recognition: A promising approach

Figure 3 for Deep Haar Scattering Networks in Pattern Recognition: A promising approach

Figure 4 for Deep Haar Scattering Networks in Pattern Recognition: A promising approach

Abstract:The aim of this paper is to discuss the use of Haar scattering networks, which is a very simple architecture that naturally supports a large number of stacked layers, yet with very few parameters, in a relatively broad set of pattern recognition problems, including regression and classification tasks. This architecture, basically, consists of stacking convolutional filters, that can be thought as a generalization of Haar wavelets, followed by non-linear operators which aim to extract symmetries and invariances that are later fed in a classification/regression algorithm. We show that good results can be obtained with the proposed method for both kind of tasks. We have outperformed the best available algorithms in 4 out of 18 important data classification problems, and have obtained a more robust performance than ARIMA and ETS time series methods in regression problems for data with strong periodicities.

Via

Access Paper or Ask Questions

Building Function Approximators on top of Haar Scattering Networks

Apr 09, 2018

Fernando Fernandes Neto

Figure 1 for Building Function Approximators on top of Haar Scattering Networks

Figure 2 for Building Function Approximators on top of Haar Scattering Networks

Figure 3 for Building Function Approximators on top of Haar Scattering Networks

Figure 4 for Building Function Approximators on top of Haar Scattering Networks

Abstract:In this article we propose building general-purpose function approximators on top of Haar Scattering Networks. We advocate that this architecture enables a better comprehension of feature extraction, in addition to its implementation simplicity and low computational costs. We show its approximation and feature extraction capabilities in a wide range of different problems, which can be applied on several phenomena in signal processing, system identification, econometrics and other potential fields.

* 7 pages, 5 figures, to appear in International Journal of Machine Learning and Computing, vol. 8 number 3

Via

Access Paper or Ask Questions

Generative Models for Stochastic Processes Using Convolutional Neural Networks

Jan 09, 2018

Fernando Fernandes Neto

Figure 1 for Generative Models for Stochastic Processes Using Convolutional Neural Networks

Figure 2 for Generative Models for Stochastic Processes Using Convolutional Neural Networks

Figure 3 for Generative Models for Stochastic Processes Using Convolutional Neural Networks

Figure 4 for Generative Models for Stochastic Processes Using Convolutional Neural Networks

Abstract:The present paper aims to demonstrate the usage of Convolutional Neural Networks as a generative model for stochastic processes, enabling researchers from a wide range of fields (such as quantitative finance and physics) to develop a general tool for forecasts and simulations without the need to identify/assume a specific system structure or estimate its parameters.

Via

Access Paper or Ask Questions