Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Przemyslaw Kazienko

Self-training Large Language Models through Knowledge Detection

Jun 17, 2024

Wei Jie Yeo, Teddy Ferdinan, Przemyslaw Kazienko, Ranjan Satapathy, Erik Cambria

Figure 1 for Self-training Large Language Models through Knowledge Detection

Figure 2 for Self-training Large Language Models through Knowledge Detection

Figure 3 for Self-training Large Language Models through Knowledge Detection

Figure 4 for Self-training Large Language Models through Knowledge Detection

Abstract:Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency method. Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects. Furthermore, the selective training framework mitigates catastrophic forgetting in out-of-distribution benchmarks, addressing a critical limitation in training LLMs. Our findings suggest that such an approach can substantially reduce the dependency on large labeled datasets, paving the way for more scalable and cost-effective language model training.

* Under review

Via

Access Paper or Ask Questions

RWKV: Reinventing RNNs for the Transformer Era

May 22, 2023

Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV(+20 more)

Figure 1 for RWKV: Reinventing RNNs for the Transformer Era

Figure 2 for RWKV: Reinventing RNNs for the Transformer Era

Figure 3 for RWKV: Reinventing RNNs for the Transformer Era

Figure 4 for RWKV: Reinventing RNNs for the Transformer Era

Abstract:Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of Transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, which parallelizes computations during training and maintains constant computational and memory complexity during inference, leading to the first non-transformer architecture to be scaled to tens of billions of parameters. Our experiments reveal that RWKV performs on par with similarly sized Transformers, suggesting that future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling the trade-offs between computational efficiency and model performance in sequence processing tasks.

Via

Access Paper or Ask Questions

Label-dependent Feature Extraction in Social Networks for Node Classification

Mar 01, 2013

Tomasz Kajdanowicz, Przemyslaw Kazienko, Piotr Doskocz

Figure 1 for Label-dependent Feature Extraction in Social Networks for Node Classification

Figure 2 for Label-dependent Feature Extraction in Social Networks for Node Classification

Figure 3 for Label-dependent Feature Extraction in Social Networks for Node Classification

Figure 4 for Label-dependent Feature Extraction in Social Networks for Node Classification

Abstract:A new method of feature extraction in the social network for within-network classification is proposed in the paper. The method provides new features calculated by combination of both: network structure information and class labels assigned to nodes. The influence of various features on classification performance has also been studied. The experiments on real-world data have shown that features created owing to the proposed method can lead to significant improvement of classification accuracy.

* Kajdanowicz T., Kazienko P., Doskocz P.: Label-dependent Feature Extraction in Social Networks for Node Classification. Lecture Notes in Artificial Intelligence LNAI 6430, Springer, 2010, pp. 89-102
* feature extraction, label-dependent features, classification, social network analysis, AMD social network

Via

Access Paper or Ask Questions