Abstract:In conversational recommender systems (CRSs), conversations usually involve a set of items and item-related entities or attributes, e.g., director is a related entity of a movie. These items and item-related entities are often mentioned along the development of a dialog, leading to potential sequential dependencies among them. However, most of existing CRSs neglect these potential sequential dependencies. In this article, we first propose a Transformer-based sequential conversational recommendation method, named TSCR, to model the sequential dependencies in the conversations to improve CRS. In TSCR, we represent conversations by items and the item-related entities, and construct user sequences to discover user preferences by considering both the mentioned items and item-related entities. Based on the constructed sequences, we deploy a Cloze task to predict the recommended items along a sequence. Meanwhile, in certain domains, knowledge graphs formed by the items and their related entities are readily available, which provide various different kinds of associations among them. Given that TSCR does not benefit from such knowledge graphs, we then propose a knowledge graph enhanced version of TSCR, called TSCRKG. In specific, we leverage the knowledge graph to offline initialize our model TSCRKG, and augment the user sequence of conversations (i.e., sequence of the mentioned items and item-related entities in the conversation) with multi-hop paths in the knowledge graph. Experimental results demonstrate that our TSCR model significantly outperforms state-of-the-art baselines, and the enhanced version TSCRKG further improves recommendation performance on top of TSCR.
Abstract:Online shopping platforms, such as Amazon and AliExpress, are increasingly prevalent in society, helping customers purchase products conveniently. With recent progress in natural language processing, researchers and practitioners shift their focus from traditional product search to conversational product search. Conversational product search enables user-machine conversations and through them collects explicit user feedback that allows to actively clarify the users' product preferences. Therefore, prospective research on an intelligent shopping assistant via conversations is indispensable. Existing publications on conversational product search either model conversations independently from users, queries, and products or lead to a vocabulary mismatch. In this work, we propose a new conversational product search model, ConvPS, to assist users in locating desirable items. The model is first trained to jointly learn the semantic representations of user, query, item, and conversation via a unified generative framework. After learning these representations, they are integrated to retrieve the target items in the latent semantic space. Meanwhile, we propose a set of greedy and explore-exploit strategies to learn to ask the user a sequence of high-performance questions for conversations. Our proposed ConvPS model can naturally integrate the representation learning of the user, query, item, and conversation into a unified generative framework, which provides a promising avenue for constructing accurate and robust conversational product search systems that are flexible and adaptive. Experimental results demonstrate that our ConvPS model significantly outperforms state-of-the-art baselines.
Abstract:In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition (SVD), effectively leveraging original matrix knowledge while reducing tunable parameters. Specifically, NoRA freezes the outer LoRA weights and utilizes an inner LoRA design, providing enhanced control over model optimization. This approach allows the model to more precisely adapt to specific tasks while maintaining a compact parameter space. By freezing outer LoRA weights and using an inner LoRA design, NoRA enables precise task adaptation with a compact parameter space. Evaluations on tasks including commonsense reasoning with large language models, fine-tuning vision-language models, and subject-driven generation demonstrate NoRA's superiority over LoRA and its variants. Notably, NoRA reduces fine-tuning parameters|training-time|memory-usage by 4\%|22.5\%|20.7\% compared to LoRA on LLaMA-3 8B, while achieving 2.2\% higher performance. Code will be released upon acceptance.
Abstract:IMPORTANCE The response effectiveness of different large language models (LLMs) and various individuals, including medical students, graduate students, and practicing physicians, in pediatric ophthalmology consultations, has not been clearly established yet. OBJECTIVE Design a 100-question exam based on pediatric ophthalmology to evaluate the performance of LLMs in highly specialized scenarios and compare them with the performance of medical students and physicians at different levels. DESIGN, SETTING, AND PARTICIPANTS This survey study assessed three LLMs, namely ChatGPT (GPT-3.5), GPT-4, and PaLM2, were assessed alongside three human cohorts: medical students, postgraduate students, and attending physicians, in their ability to answer questions related to pediatric ophthalmology. It was conducted by administering questionnaires in the form of test papers through the LLM network interface, with the valuable participation of volunteers. MAIN OUTCOMES AND MEASURES Mean scores of LLM and humans on 100 multiple-choice questions, as well as the answer stability, correlation, and response confidence of each LLM. RESULTS GPT-4 performed comparably to attending physicians, while ChatGPT (GPT-3.5) and PaLM2 outperformed medical students but slightly trailed behind postgraduate students. Furthermore, GPT-4 exhibited greater stability and confidence when responding to inquiries compared to ChatGPT (GPT-3.5) and PaLM2. CONCLUSIONS AND RELEVANCE Our results underscore the potential for LLMs to provide medical assistance in pediatric ophthalmology and suggest significant capacity to guide the education of medical students.
Abstract:Purpose: The performance of three different large language models (LLMS) (GPT-3.5, GPT-4, and PaLM2) in answering ophthalmology professional questions was evaluated and compared with that of three different professional populations (medical undergraduates, medical masters, and attending physicians). Methods: A 100-item ophthalmology single-choice test was administered to three different LLMs (GPT-3.5, GPT-4, and PaLM2) and three different professional levels (medical undergraduates, medical masters, and attending physicians), respectively. The performance of LLM was comprehensively evaluated and compared with the human group in terms of average score, stability, and confidence. Results: Each LLM outperformed undergraduates in general, with GPT-3.5 and PaLM2 being slightly below the master's level, while GPT-4 showed a level comparable to that of attending physicians. In addition, GPT-4 showed significantly higher answer stability and confidence than GPT-3.5 and PaLM2. Conclusion: Our study shows that LLM represented by GPT-4 performs better in the field of ophthalmology. With further improvements, LLM will bring unexpected benefits in medical education and clinical decision making in the near future.
Abstract:More and more stock trading strategies are constructed using deep reinforcement learning (DRL) algorithms, but DRL methods originally widely used in the gaming community are not directly adaptable to financial data with low signal-to-noise ratios and unevenness, and thus suffer from performance shortcomings. In this paper, to capture the hidden information, we propose a DRL based stock trading system using cascaded LSTM, which first uses LSTM to extract the time-series features from stock daily data, and then the features extracted are fed to the agent for training, while the strategy functions in reinforcement learning also use another LSTM for training. Experiments in DJI in the US market and SSE50 in the Chinese stock market show that our model outperforms previous baseline models in terms of cumulative returns and Sharp ratio, and this advantage is more significant in the Chinese stock market, a merging market. It indicates that our proposed method is a promising way to build a automated stock trading system.
Abstract:In this work we propose a new task: artistic visualization of classical Chinese poems, where the goal is to generatepaintings of a certain artistic style for classical Chinese poems. For this purpose, we construct a new dataset called Paint4Poem. Thefirst part of Paint4Poem consists of 301 high-quality poem-painting pairs collected manually from an influential modern Chinese artistFeng Zikai. As its small scale poses challenges for effectively training poem-to-painting generation models, we introduce the secondpart of Paint4Poem, which consists of 3,648 caption-painting pairs collected manually from Feng Zikai's paintings and 89,204 poem-painting pairs collected automatically from the web. We expect the former to help learning the artist painting style as it containshis most paintings, and the latter to help learning the semantic relevance between poems and paintings. Further, we analyze Paint4Poem regarding poem diversity, painting style, and the semantic relevance between poems and paintings. We create abenchmark for Paint4Poem: we train two representative text-to-image generation models: AttnGAN and MirrorGAN, and evaluate theirperformance regarding painting pictorial quality, painting stylistic relevance, and semantic relevance between poems and paintings.The results indicate that the models are able to generate paintings that have good pictorial quality and mimic Feng Zikai's style, but thereflection of poem semantics is limited. The dataset also poses many interesting research directions on this task, including transferlearning, few-shot learning, text-to-image generation for low-resource data etc. The dataset is publicly available.(https://github.com/paint4poem/paint4poem)
Abstract:Fermion sampling is to generate probability distribution of a many-body Slater-determinant wavefunction, which is termed "determinantal point process" in statistical analysis. For its inherently-embedded Pauli exclusion principle, its application reaches beyond simulating fermionic quantum many-body physics to constructing machine learning models for diversified datasets. Here we propose a fermion sampling algorithm, which has a polynomial time-complexity -- quadratic in the fermion number and linear in the system size. This algorithm is about 100% more efficient in computation time than the best known algorithms. In sampling the corresponding marginal distribution, our algorithm has a more drastic improvement, achieving a scaling advantage. We demonstrate its power on several test applications, including sampling fermions in a many-body system and a machine learning task of text summarization, and confirm its improved computation efficiency over other methods by counting floating-point operations.
Abstract:Most conversational recommendation approaches are either not explainable, or they require external user's knowledge for explaining or their explanations cannot be applied in real time due to computational limitations. In this work, we present a real time category based conversational recommendation approach, which can provide concise explanations without prior user knowledge being required. We first perform an explainable user model in the form of preferences over the items' categories, and then use the category preferences to recommend items. The user model is performed by applying a BERT-based neural architecture on the conversation. Then, we translate the user model into item recommendation scores using a Feed Forward Network. User preferences during the conversation in our approach are represented by category vectors which are directly interpretable. The experimental results on the real conversational recommendation dataset ReDial demonstrate comparable performance to the state-of-the-art, while our approach is explainable. We also show the potential power of our framework by involving an oracle setting of category preference prediction.
Abstract:Search and recommender systems that take the initiative to ask clarifying questions to better understand users' information needs are receiving increasing attention from the research community. However, to the best of our knowledge, there is no empirical study to quantify whether and to what extent users are willing or able to answer these questions. In this work, we conduct an online experiment by deploying an experimental system, which interacts with users by asking clarifying questions against a product repository. We collect both implicit interaction behavior data and explicit feedback from users showing that: (a) users are willing to answer a good number of clarifying questions (11-21 on average), but not many more than that; (b) most users answer questions until they reach the target product, but also a fraction of them stops due to fatigue or due to receiving irrelevant questions; (c) part of the users' answers (12-17%) are actually opposite to the description of the target product; while (d) most of the users (66-84%) find the question-based system helpful towards completing their tasks. Some of the findings of the study contradict current assumptions on simulated evaluations in the field, while they point towards improvements in the evaluation framework and can inspire future interactive search/recommender system designs.