Abstract:Customers reach out to online live chat agents with various intents, such as asking about product details or requesting a return. In this paper, we propose the problem of predicting user intent from browsing history and address it through a two-stage approach. The first stage classifies a user's browsing history into high-level intent categories. Here, we represent each browsing history as a text sequence of page attributes and use the ground-truth class labels to fine-tune pretrained Transformers. The second stage provides a large language model (LLM) with the browsing history and predicted intent class to generate fine-grained intents. For automatic evaluation, we use a separate LLM to judge the similarity between generated and ground-truth intents, which closely aligns with human judgments. Our two-stage approach yields significant performance gains compared to generating intents without the classification stage.
Abstract:Conventional image signal processing (ISP) frameworks are designed to reconstruct an RGB image from a single raw measurement. As multi-camera systems become increasingly popular these days, it is worth exploring improvements in ISP frameworks by incorporating raw measurements from multiple cameras. This manuscript is an intermediate progress report of a new ISP framework that is under development, StereoISP. It employs raw measurements from a stereo camera pair to generate a demosaicked, denoised RGB image by utilizing disparity estimated between the two views. We investigate StereoISP by testing the performance on raw image pairs synthesized from stereo datasets. Our preliminary results show an improvement in the PSNR of the reconstructed RGB image by at least 2dB on KITTI 2015 and drivingStereo datasets using ground truth sparse disparity maps.