Abstract:This research examines the effectiveness of OpenAI's GPT models as independent evaluators of text summaries generated by six transformer-based models from Hugging Face: DistilBART, BERT, ProphetNet, T5, BART, and PEGASUS. We evaluated these summaries based on essential properties of high-quality summary - conciseness, relevance, coherence, and readability - using traditional metrics such as ROUGE and Latent Semantic Analysis (LSA). Uniquely, we also employed GPT not as a summarizer but as an evaluator, allowing it to independently assess summary quality without predefined metrics. Our analysis revealed significant correlations between GPT evaluations and traditional metrics, particularly in assessing relevance and coherence. The results demonstrate GPT's potential as a robust tool for evaluating text summaries, offering insights that complement established metrics and providing a basis for comparative analysis of transformer-based models in natural language processing tasks.
Abstract:Wrist accelerometers for assessing hallmark measures of physical activity (PA) are rapidly growing with the advent of smartwatch technology. Given the growing popularity of wrist-worn accelerometers, there needs to be a rigorous evaluation for recognizing (PA) type and estimating energy expenditure (EE) across the lifespan. Participants (66% women, aged 20-89 yrs) performed a battery of 33 daily activities in a standardized laboratory setting while a tri-axial accelerometer collected data from the right wrist. A portable metabolic unit was worn to measure metabolic intensity. We built deep learning networks to extract spatial and temporal representations from the time-series data, and used them to recognize PA type and estimate EE. The deep learning models resulted in high performance; the F1 score was: 0.82, 0.81, and 95 for recognizing sedentary, locomotor, and lifestyle activities, respectively. The root mean square error was 1.1 (+/-0.13) for the estimation of EE.