Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Si Shen

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

Feb 06, 2024

Si Shen, Peijun Shen, Danhao Zhu

Abstract:This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.

Via

Access Paper or Ask Questions

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Jul 11, 2023

Dongbo Wang, Chang Liu, Zhixiao Zhao, Si Shen, Liu Liu, Bin Li, Haotian Hu, Mengcheng Wu, Litao Lin, Xue Zhao(+1 more)

Figure 1 for GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Figure 2 for GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Figure 3 for GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Figure 4 for GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

Abstract:In the context of the rapid development of large language models, we have meticulously trained and introduced the GujiBERT and GujiGPT language models, which are foundational models specifically designed for intelligent information processing of ancient texts. These models have been trained on an extensive dataset that encompasses both simplified and traditional Chinese characters, allowing them to effectively handle various natural language processing tasks related to ancient books, including but not limited to automatic sentence segmentation, punctuation, word segmentation, part-of-speech tagging, entity recognition, and automatic translation. Notably, these models have exhibited exceptional performance across a range of validation tasks using publicly available datasets. Our research findings highlight the efficacy of employing self-supervised methods to further train the models using classical text corpora, thus enhancing their capability to tackle downstream tasks. Moreover, it is worth emphasizing that the choice of font, the scale of the corpus, and the initial model selection all exert significant influence over the ultimate experimental outcomes. To cater to the diverse text processing preferences of researchers in digital humanities and linguistics, we have developed three distinct categories comprising a total of nine model variations. We believe that by sharing these foundational language models specialized in the domain of ancient texts, we can facilitate the intelligent processing and scholarly exploration of ancient literary works and, consequently, contribute to the global dissemination of China's rich and esteemed traditional culture in this new era.

* 22pages,0 figure

Via

Access Paper or Ask Questions

Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Oct 16, 2022

Baijun Ji, Tong Zhang, Yicheng Zou, Bojie Hu, Si Shen

Figure 1 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 2 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 3 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Figure 4 for Increasing Visual Awareness in Multimodal Neural Machine Translation from an Information Theoretic Perspective

Abstract:Multimodal machine translation (MMT) aims to improve translation quality by equipping the source sentence with its corresponding image. Despite the promising performance, MMT models still suffer the problem of input degradation: models focus more on textual information while visual information is generally overlooked. In this paper, we endeavor to improve MMT performance by increasing visual awareness from an information theoretic perspective. In detail, we decompose the informative visual signals into two parts: source-specific information and target-specific information. We use mutual information to quantify them and propose two methods for objective optimization to better leverage visual signals. Experiments on two datasets demonstrate that our approach can effectively enhance the visual awareness of MMT model and achieve superior results against strong baselines.

* 10 pages, 4 figures; EMNLP main conference

Via

Access Paper or Ask Questions

SsciBERT: A Pre-trained Language Model for Social Science Texts

Jun 11, 2022

Si Shen, Jiangfeng Liu, Litao Lin, Ying Huang, Lin Zhang, Chang Liu, Yutong Feng, Dongbo Wang

Figure 1 for SsciBERT: A Pre-trained Language Model for Social Science Texts

Figure 2 for SsciBERT: A Pre-trained Language Model for Social Science Texts

Figure 3 for SsciBERT: A Pre-trained Language Model for Social Science Texts

Figure 4 for SsciBERT: A Pre-trained Language Model for Social Science Texts

Abstract:The academic literature of social sciences is the literature that records human civilization and studies human social problems. With the large-scale growth of this literature, ways to quickly find existing research on relevant issues have become an urgent demand for researchers. Previous studies, such as SciBERT, have shown that pre-training using domain-specific texts can improve the performance of natural language processing tasks in those fields. However, there is no pre-trained language model for social sciences, so this paper proposes a pre-trained model on many abstracts published in the Social Science Citation Index (SSCI) journals. The models, which are available on Github (https://github.com/S-T-Full-Text-Knowledge-Mining/SSCI-BERT), show excellent performance on discipline classification and abstract structure-function recognition tasks with the social sciences literature.

* 19 pages,2 figures

Via

Access Paper or Ask Questions

Going Wider: Recurrent Neural Network With Parallel Cells

May 03, 2017

Danhao Zhu, Si Shen, Xin-Yu Dai, Jiajun Chen

Figure 1 for Going Wider: Recurrent Neural Network With Parallel Cells

Figure 2 for Going Wider: Recurrent Neural Network With Parallel Cells

Figure 3 for Going Wider: Recurrent Neural Network With Parallel Cells

Figure 4 for Going Wider: Recurrent Neural Network With Parallel Cells

Abstract:Recurrent Neural Network (RNN) has been widely applied for sequence modeling. In RNN, the hidden states at current step are full connected to those at previous step, thus the influence from less related features at previous step may potentially decrease model's learning ability. We propose a simple technique called parallel cells (PCs) to enhance the learning ability of Recurrent Neural Network (RNN). In each layer, we run multiple small RNN cells rather than one single large cell. In this paper, we evaluate PCs on 2 tasks. On language modeling task on PTB (Penn Tree Bank), our model outperforms state of art models by decreasing perplexity from 78.6 to 75.3. On Chinese-English translation task, our model increases BLEU score for 0.39 points than baseline model.

Via

Access Paper or Ask Questions