Abstract:Conversational recommendation systems (CRS) leverage contextual information from conversations to generate recommendations but often struggle due to a lack of collaborative filtering (CF) signals, which capture user-item interaction patterns essential for accurate recommendations. We introduce Reddit-ML32M, a dataset that links reddit conversations with interactions on MovieLens 32M, to enrich item representations by leveraging collaborative knowledge and addressing interaction sparsity in conversational datasets. We propose an LLM-based framework that uses Reddit-ML32M to align LLM-generated recommendations with CF embeddings, refining rankings for better performance. We evaluate our framework against three sets of baselines: CF-based recommenders using only interactions from CRS tasks, traditional CRS models, and LLM-based methods relying on conversational context without item representations. Our approach achieves consistent improvements, including a 12.32% increase in Hit Rate and a 9.9% improvement in NDCG, outperforming the best-performing baseline that relies on conversational context but lacks collaborative item representations.
Abstract:Customers reach out to online live chat agents with various intents, such as asking about product details or requesting a return. In this paper, we propose the problem of predicting user intent from browsing history and address it through a two-stage approach. The first stage classifies a user's browsing history into high-level intent categories. Here, we represent each browsing history as a text sequence of page attributes and use the ground-truth class labels to fine-tune pretrained Transformers. The second stage provides a large language model (LLM) with the browsing history and predicted intent class to generate fine-grained intents. For automatic evaluation, we use a separate LLM to judge the similarity between generated and ground-truth intents, which closely aligns with human judgments. Our two-stage approach yields significant performance gains compared to generating intents without the classification stage.
Abstract:Conventional image signal processing (ISP) frameworks are designed to reconstruct an RGB image from a single raw measurement. As multi-camera systems become increasingly popular these days, it is worth exploring improvements in ISP frameworks by incorporating raw measurements from multiple cameras. This manuscript is an intermediate progress report of a new ISP framework that is under development, StereoISP. It employs raw measurements from a stereo camera pair to generate a demosaicked, denoised RGB image by utilizing disparity estimated between the two views. We investigate StereoISP by testing the performance on raw image pairs synthesized from stereo datasets. Our preliminary results show an improvement in the PSNR of the reconstructed RGB image by at least 2dB on KITTI 2015 and drivingStereo datasets using ground truth sparse disparity maps.