Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hozaifa Kassab

RIRO: Reshaping Inputs, Refining Outputs Unlocking the Potential of Large Language Models in Data-Scarce Contexts

Dec 15, 2024

Ali Hamdi, Hozaifa Kassab, Mohamed Bahaa, Marwa Mohamed

Abstract:Large language models (LLMs) have significantly advanced natural language processing, excelling in areas like text generation, summarization, and question-answering. Despite their capabilities, these models face challenges when fine-tuned on small, domain-specific datasets, often struggling to generalize and deliver accurate results with unfamiliar inputs. To tackle this issue, we introduce RIRO, a novel two-layer architecture designed to improve performance in data-scarce environments. The first layer leverages advanced prompt engineering to reformulate inputs, ensuring better alignment with training data, while the second layer focuses on refining outputs to minimize inconsistencies. Through fine-tuning models like Phi-2, Falcon 7B, and Falcon 1B, with Phi-2 outperforming the others. Additionally, we introduce a benchmark using evaluation metrics such as cosine similarity, Levenshtein distance, BLEU score, ROUGE-1, ROUGE-2, and ROUGE-L. While these advancements improve performance, challenges like computational demands and overfitting persist, limiting the potential of LLMs in data-scarce, high-stakes environments such as healthcare, legal documentation, and software testing.

Via

Access Paper or Ask Questions

MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition

Jul 08, 2024

Hozaifa Kassab, Ahmed Mahmoud, Mohamed Bahaa, Ammar Mohamed, Ali Hamdi

Figure 1 for MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition

Figure 2 for MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition

Figure 3 for MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition

Figure 4 for MMIS: Multimodal Dataset for Interior Scene Visual Generation and Recognition

Abstract:We introduce MMIS, a novel dataset designed to advance MultiModal Interior Scene generation and recognition. MMIS consists of nearly 160,000 images. Each image within the dataset is accompanied by its corresponding textual description and an audio recording of that description, providing rich and diverse sources of information for scene generation and recognition. MMIS encompasses a wide range of interior spaces, capturing various styles, layouts, and furnishings. To construct this dataset, we employed careful processes involving the collection of images, the generation of textual descriptions, and corresponding speech annotations. The presented dataset contributes to research in multi-modal representation learning tasks such as image generation, retrieval, captioning, and classification.

Via

Access Paper or Ask Questions

GCF: Graph Convolutional Networks for Facial Expression Recognition

Jul 02, 2024

Hozaifa Kassab, Mohamed Bahaa, Ali Hamdi

Figure 1 for GCF: Graph Convolutional Networks for Facial Expression Recognition

Figure 2 for GCF: Graph Convolutional Networks for Facial Expression Recognition

Figure 3 for GCF: Graph Convolutional Networks for Facial Expression Recognition

Figure 4 for GCF: Graph Convolutional Networks for Facial Expression Recognition

Abstract:Facial Expression Recognition (FER) is vital for understanding interpersonal communication. However, existing classification methods often face challenges such as vulnerability to noise, imbalanced datasets, overfitting, and generalization issues. In this paper, we propose GCF, a novel approach that utilizes Graph Convolutional Networks for FER. GCF integrates Convolutional Neural Networks (CNNs) for feature extraction, using either custom architectures or pretrained models. The extracted visual features are then represented on a graph, enhancing local CNN features with global features via a Graph Convolutional Neural Network layer. We evaluate GCF on benchmark datasets including CK+, JAFFE, and FERG. The results show that GCF significantly improves performance over state-of-the-art methods. For example, GCF enhances the accuracy of ResNet18 from 92% to 98% on CK+, from 66% to 89% on JAFFE, and from 94% to 100% on FERG. Similarly, GCF improves the accuracy of VGG16 from 89% to 97% on CK+, from 72% to 92% on JAFFE, and from 96% to 99.49% on FERG. We provide a comprehensive analysis of our approach, demonstrating its effectiveness in capturing nuanced facial expressions. By integrating graph convolutions with CNNs, GCF significantly advances FER, offering improved accuracy and robustness in real-world applications.

Via

Access Paper or Ask Questions