Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tony Smith

Detection of Human and Machine-Authored Fake News in Urdu

Oct 25, 2024

Muhammad Zain Ali, Yuxia Wang, Bernhard Pfahringer, Tony Smith

Abstract:The rise of social media has amplified the spread of fake news, now further complicated by large language models (LLMs) like ChatGPT, which ease the generation of highly convincing, error-free misinformation, making it increasingly challenging for the public to discern truth from falsehood. Traditional fake news detection methods relying on linguistic cues also becomes less effective. Moreover, current detectors primarily focus on binary classification and English texts, often overlooking the distinction between machine-generated true vs. fake news and the detection in low-resource languages. To this end, we updated detection schema to include machine-generated news with focus on the Urdu language. We further propose a hierarchical detection strategy to improve the accuracy and robustness. Experiments show its effectiveness across four datasets in various settings.

Via

Access Paper or Ask Questions

Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Dec 03, 2021

Vithya Yogarajan, Bernhard Pfahringer, Tony Smith, Jacob Montiel

Figure 1 for Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Figure 2 for Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Figure 3 for Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Figure 4 for Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Abstract:Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.

Via

Access Paper or Ask Questions

Predicting COVID-19 Patient Shielding: A Comprehensive Study

Oct 01, 2021

Vithya Yogarajan, Jacob Montiel, Tony Smith, Bernhard Pfahringer

Figure 1 for Predicting COVID-19 Patient Shielding: A Comprehensive Study

Figure 2 for Predicting COVID-19 Patient Shielding: A Comprehensive Study

Figure 3 for Predicting COVID-19 Patient Shielding: A Comprehensive Study

Figure 4 for Predicting COVID-19 Patient Shielding: A Comprehensive Study

Abstract:There are many ways machine learning and big data analytics are used in the fight against the COVID-19 pandemic, including predictions, risk management, diagnostics, and prevention. This study focuses on predicting COVID-19 patient shielding -- identifying and protecting patients who are clinically extremely vulnerable from coronavirus. This study focuses on techniques used for the multi-label classification of medical text. Using the information published by the United Kingdom NHS and the World Health Organisation, we present a novel approach to predicting COVID-19 patient shielding as a multi-label classification problem. We use publicly available, de-identified ICU medical text data for our experiments. The labels are derived from the published COVID-19 patient shielding data. We present an extensive comparison across 12 multi-label classifiers from the simple binary relevance to neural networks and the most recent transformers. To the best of our knowledge this is the first comprehensive study, where such a range of multi-label classifiers for medical text are considered. We highlight the benefits of various approaches, and argue that, for the task at hand, both predictive accuracy and processing time are essential.

* The 2021 Australasian Joint Conference on Artificial Intelligence (AJCAI 2021)
* Accepted in AJCAI 2021

Via

Access Paper or Ask Questions

Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Mar 29, 2020

Vithya Yogarajan, Jacob Montiel, Tony Smith, Bernhard Pfahringer

Figure 1 for Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Figure 2 for Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Figure 3 for Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Figure 4 for Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes

Abstract:Machine learning-based multi-label medical text classifications can be used to enhance the understanding of the human body and aid the need for patient care. We present a broad study on clinical natural language processing techniques to maximise a feature representing text when predicting medical codes on patients with multi-morbidity. We present results of multi-label medical text classification problems with 18, 50 and 155 labels. We compare several variations to embeddings, text tagging, and pre-processing. For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings. We also show that high dimensional embeddings pre-trained using health-related data present a significant improvement in a multi-label setting, similarly to the way they improve performance for binary classification. High dimensional embeddings from this research are made available for public use.

Via

Access Paper or Ask Questions