IBM T.J. Watson Research Center
Abstract:While Annealing Machines (AM) have shown increasing capabilities in solving complex combinatorial problems, positioning themselves as a more immediate alternative to the expected advances of future fully quantum solutions, there are still scaling limitations. In parallel, Graph Neural Networks (GNN) have been recently adapted to solve combinatorial problems, showing competitive results and potentially high scalability due to their distributed nature. We propose a merging approach that aims at retaining both the accuracy exhibited by AMs and the representational flexibility and scalability of GNNs. Our model considers a compression step, followed by a supervised interaction where partial solutions obtained from the AM are used to guide local GNNs from where node feature representations are obtained and combined to initialize an additional GNN-based solver that handles the original graph's target problem. Intuitively, the AM can solve the combinatorial problem indirectly by infusing its knowledge into the GNN. Experiments on canonical optimization problems show that the idea is feasible, effectively allowing the AM to solve size problems beyond its original limits.
Abstract:Stock market forecasting has been a topic of extensive research, aiming to provide investors with optimal stock recommendations for higher returns. In recent years, this field has gained even more attention due to the widespread adoption of deep learning models. While these models have achieved impressive accuracy in predicting stock behavior, tailoring them to specific scenarios has become increasingly important. Election Day represents one such critical scenario, characterized by intensified market volatility, as the winning candidate's policies significantly impact various economic sectors and companies. To address this challenge, we propose the Election Day Stock Market Forecasting (EDSMF) Model. Our approach leverages the contextual capabilities of large language models alongside specialized agents designed to analyze the political and economic consequences of elections. By building on a state-of-the-art architecture, we demonstrate that EDSMF improves the predictive performance of the S&P 500 during this uniquely volatile day.
Abstract:Electroencephalography (EEG) signals provide critical insights for applications in disease diagnosis and healthcare. However, the scarcity of labeled EEG data poses a significant challenge. Foundation models offer a promising solution by leveraging large-scale unlabeled data through pre-training, enabling strong performance across diverse tasks. While both temporal dynamics and inter-channel relationships are vital for understanding EEG signals, existing EEG foundation models primarily focus on the former, overlooking the latter. To address this limitation, we propose a novel foundation model for EEG that integrates both temporal and inter-channel information. Our architecture combines Graph Neural Networks (GNNs), which effectively capture relational structures, with a masked autoencoder to enable efficient pre-training. We evaluated our approach using three downstream tasks and experimented with various GNN architectures. The results demonstrate that our proposed model, particularly when employing the GCN architecture with optimized configurations, consistently outperformed baseline methods across all tasks. These findings suggest that our model serves as a robust foundation model for EEG analysis.
Abstract:In diagnosing mental diseases from electroencephalography (EEG) data, neural network models such as Transformers have been employed to capture temporal dynamics. Additionally, it is crucial to learn the spatial relationships between EEG sensors, for which Graph Neural Networks (GNNs) are commonly used. However, fine-tuning large-scale complex neural network models simultaneously to capture both temporal and spatial features increases computational costs due to the more significant number of trainable parameters. It causes the limited availability of EEG datasets for downstream tasks, making it challenging to fine-tune large models effectively. We propose EEG-GraphAdapter (EGA), a parameter-efficient fine-tuning (PEFT) approach to address these challenges. EGA is integrated into pre-trained temporal backbone models as a GNN-based module and fine-tuned itself alone while keeping the backbone model parameters frozen. This enables the acquisition of spatial representations of EEG signals for downstream tasks, significantly reducing computational overhead and data requirements. Experimental evaluations on healthcare-related downstream tasks of Major Depressive Disorder and Abnormality Detection demonstrate that our EGA improves performance by up to 16.1% in the F1-score compared with the backbone BENDR model.
Abstract:Large Language Models are applied to recommendation tasks such as items to buy and news articles to read. Point of Interest is quite a new area to sequential recommendation based on language representations of multimodal datasets. As a first step to prove our concepts, we focused on restaurant recommendation based on each user's past visit history. When choosing a next restaurant to visit, a user would consider genre and location of the venue and, if available, pictures of dishes served there. We created a pseudo restaurant check-in history dataset from the Foursquare dataset and the FoodX-251 dataset by converting pictures into text descriptions with a multimodal model called LLaVA, and used a language-based sequential recommendation framework named Recformer proposed in 2023. A model trained on this semi-multimodal dataset has outperformed another model trained on the same dataset without picture descriptions. This suggests that this semi-multimodal model reflects actual human behaviours and that our path to a multimodal recommendation model is in the right direction.
Abstract:In recent years, Recommender Systems (RS) have witnessed a transformative shift with the advent of Large Language Models (LLMs) in the field of Natural Language Processing (NLP). Models such as GPT-3.5/4, Llama, have demonstrated unprecedented capabilities in understanding and generating human-like text. The extensive information pre-trained by these LLMs allows for the potential to capture a more profound semantic representation from different contextual information of users and items. While the great potential lies behind the thriving of LLMs, the challenge of leveraging user-item preferences from contextual information and its alignment with the improvement of Recommender Systems needs to be addressed. Believing that a better understanding of the user or item itself can be the key factor in improving recommendation performance, we conduct research on generating informative profiles using state-of-the-art LLMs. To boost the linguistic abilities of LLMs in Recommender Systems, we introduce the Prompting-Based Representation Learning Method for Recommendation (P4R). In our P4R framework, we utilize the LLM prompting strategy to create personalized item profiles. These profiles are then transformed into semantic representation spaces using a pre-trained BERT model for text embedding. Furthermore, we incorporate a Graph Convolution Network (GCN) for collaborative filtering representation. The P4R framework aligns these two embedding spaces in order to address the general recommendation tasks. In our evaluation, we compare P4R with state-of-the-art Recommender models and assess the quality of prompt-based profile generation.
Abstract:In news recommendation systems, reducing popularity bias is essential for delivering accurate and diverse recommendations. This paper presents POPK, a new method that uses temporal-counterfactual analysis to mitigate the influence of popular news articles. By asking, "What if, at a given time $t$, a set of popular news articles were competing for the user's attention to be clicked?", POPK aims to improve recommendation accuracy and diversity. We tested POPK on three different language datasets (Japanese, English, and Norwegian) and found that it successfully enhances traditional methods. POPK offers flexibility for customization to enhance either accuracy or diversity, alongside providing distinct ways of measuring popularity. We argue that popular news articles always compete for attention, even if they are not explicitly present in the user's impression list. POPK systematically eliminates the implicit influence of popular news articles during each training step. We combine counterfactual reasoning with a temporal approach to adjust the negative sample space, refining understanding of user interests. Our findings underscore how POPK effectively enhances the accuracy and diversity of recommended articles while also tailoring the approach to specific needs.
Abstract:In recent years, journalists have expressed concerns about the increasing trend of news article avoidance, especially within specific domains. This issue has been exacerbated by the rise of recommender systems. Our research indicates that recommender systems should consider avoidance as a fundamental factor. We argue that news articles can be characterized by three principal elements: exposure, relevance, and avoidance, all of which are closely interconnected. To address these challenges, we introduce AWRS, an Avoidance-Aware Recommender System. This framework incorporates avoidance awareness when recommending news, based on the premise that news article avoidance conveys significant information about user preferences. Evaluation results on three news datasets in different languages (English, Norwegian, and Japanese) demonstrate that our method outperforms existing approaches.
Abstract:This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its activities, and technical reports on the LLMs developed by LLM-jp. For the latest activities, visit https://llm-jp.nii.ac.jp/en/.
Abstract:Large language models (LLMs) have recently transformed natural language processing, enabling machines to generate human-like text and engage in meaningful conversations. This development necessitates speed, efficiency, and accessibility in LLM inference as the computational and memory requirements of these systems grow exponentially. Meanwhile, advancements in computing and memory capabilities are lagging behind, exacerbated by the discontinuation of Moore's law. With LLMs exceeding the capacity of single GPUs, they require complex, expert-level configurations for parallel processing. Memory accesses become significantly more expensive than computation, posing a challenge for efficient scaling, known as the memory wall. Here, compute-in-memory (CIM) technologies offer a promising solution for accelerating AI inference by directly performing analog computations in memory, potentially reducing latency and power consumption. By closely integrating memory and compute elements, CIM eliminates the von Neumann bottleneck, reducing data movement and improving energy efficiency. This survey paper provides an overview and analysis of transformer-based models, reviewing various CIM architectures and exploring how they can address the imminent challenges of modern AI computing systems. We discuss transformer-related operators and their hardware acceleration schemes and highlight challenges, trends, and insights in corresponding CIM designs.