Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sushant Kumar

Scalable Permutation-Aware Modeling for Temporal Set Prediction

Apr 23, 2025

Ashish Ranjan, Ayush Agarwal, Shalin Barot, Sushant Kumar

Abstract:Temporal set prediction involves forecasting the elements that will appear in the next set, given a sequence of prior sets, each containing a variable number of elements. Existing methods often rely on intricate architectures with substantial computational overhead, which hampers their scalability. In this work, we introduce a novel and scalable framework that leverages permutation-equivariant and permutation-invariant transformations to efficiently model set dynamics. Our approach significantly reduces both training and inference time while maintaining competitive performance. Extensive experiments on multiple public benchmarks show that our method achieves results on par with or superior to state-of-the-art models across several evaluation metrics. These results underscore the effectiveness of our model in enabling efficient and scalable temporal set prediction.

Via

Access Paper or Ask Questions

LLM-driven Constrained Copy Generation through Iterative Refinement

Apr 14, 2025

Varun Vasudevan, Faezeh Akhavizadegan, Abhinav Prakash, Yokila Arora, Jason Cho, Tanya Mendiratta, Sushant Kumar, Kannan Achan

Abstract:Crafting a marketing message (copy), or copywriting is a challenging generation task, as the copy must adhere to various constraints. Copy creation is inherently iterative for humans, starting with an initial draft followed by successive refinements. However, manual copy creation is time-consuming and expensive, resulting in only a few copies for each use case. This limitation restricts our ability to personalize content to customers. Contrary to the manual approach, LLMs can generate copies quickly, but the generated content does not consistently meet all the constraints on the first attempt (similar to humans). While recent studies have shown promise in improving constrained generation through iterative refinement, they have primarily addressed tasks with only a few simple constraints. Consequently, the effectiveness of iterative refinement for tasks such as copy generation, which involves many intricate constraints, remains unclear. To address this gap, we propose an LLM-based end-to-end framework for scalable copy generation using iterative refinement. To the best of our knowledge, this is the first study to address multiple challenging constraints simultaneously in copy generation. Examples of these constraints include length, topics, keywords, preferred lexical ordering, and tone of voice. We demonstrate the performance of our framework by creating copies for e-commerce banners for three different use cases of varying complexity. Our results show that iterative refinement increases the copy success rate by $16.25-35.91$% across use cases. Furthermore, the copies generated using our approach outperformed manually created content in multiple pilot studies using a multi-armed bandit framework. The winning copy improved the click-through rate by $38.5-45.21$%.

* 10 pages, 2 figures, 7 Tables

Via

Access Paper or Ask Questions

Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

Jan 30, 2025

Abhinav Pratap, Sushant Kumar, Suchinton Chakravarty

Figure 1 for Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

Figure 2 for Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

Figure 3 for Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

Figure 4 for Adaptive Object Detection for Indoor Navigation Assistance: A Performance Evaluation of Real-Time Algorithms

Abstract:This study addresses the need for accurate and efficient object detection in assistive technologies for visually impaired individuals. We evaluate four real-time object detection algorithms YOLO, SSD, Faster R-CNN, and Mask R-CNN within the context of indoor navigation assistance. Using the Indoor Objects Detection dataset, we analyze detection accuracy, processing speed, and adaptability to indoor environments. Our findings highlight the trade-offs between precision and efficiency, offering insights into selecting optimal algorithms for realtime assistive navigation. This research advances adaptive machine learning applications, enhancing indoor navigation solutions for the visually impaired and promoting accessibility.

* 5 pages, 2 figures, 3 tables

Via

Access Paper or Ask Questions

Improving Sequential Recommender Systems with Online and In-store User Behavior

Dec 03, 2024

Luyi Ma, Aashika Padmanabhan, Anjana Ganesh, Shengwei Tang, Jiao Chen, Xiaohan Li, Lalitesh Morishetti, Kaushiki Nag, Malay Patel, Jason Cho(+2 more)

Figure 1 for Improving Sequential Recommender Systems with Online and In-store User Behavior

Figure 2 for Improving Sequential Recommender Systems with Online and In-store User Behavior

Figure 3 for Improving Sequential Recommender Systems with Online and In-store User Behavior

Figure 4 for Improving Sequential Recommender Systems with Online and In-store User Behavior

Abstract:Online e-commerce platforms have been extending in-store shopping, which allows users to keep the canonical online browsing and checkout experience while exploring in-store shopping. However, the growing transition between online and in-store becomes a challenge to sequential recommender systems for future online interaction prediction due to the lack of holistic modeling of hybrid user behaviors (online and in-store). The challenges are twofold. First, combining online and in-store user behavior data into a single data schema and supporting multiple stages in the model life cycle (pre-training, training, inference, etc.) organically needs a new data pipeline design. Second, online recommender systems, which solely rely on online user behavior sequences, must be redesigned to support online and in-store user data as input under the sequential modeling setting. To overcome the first challenge, we propose a hybrid, omnichannel data pipeline to compile online and in-store user behavior data by caching information from diverse data sources. Later, we introduce a model-agnostic encoder module to the sequential recommender system to interpret the user in-store transaction and augment the modeling capacity for better online interaction prediction given the hybrid user behavior.

* 6 pages, IEEE BigData 2024 Workshop

Via

Access Paper or Ask Questions

Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

Oct 16, 2024

Luyi Ma, Xiaohan Li, Zezhong Fan, Jianpeng Xu, Jason Cho, Praveen Kanumala, Kaushiki Nag, Sushant Kumar, Kannan Achan

Figure 1 for Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

Figure 2 for Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

Figure 3 for Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

Figure 4 for Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations

Abstract:Integrating diverse data modalities is crucial for enhancing the performance of personalized recommendation systems. Traditional models, which often rely on singular data sources, lack the depth needed to accurately capture the multifaceted nature of item features and user behaviors. This paper introduces a novel framework for multi-behavior recommendations, leveraging the fusion of triple-modality, which is visual, textual, and graph data through alignment with large language models (LLMs). By incorporating visual information, we capture contextual and aesthetic item characteristics; textual data provides insights into user interests and item features in detail; and graph data elucidates relationships within the item-behavior heterogeneous graphs. Our proposed model called Triple Modality Fusion (TMF) utilizes the power of LLMs to align and integrate these three modalities, achieving a comprehensive representation of user behaviors. The LLM models the user's interactions including behaviors and item features in natural languages. Initially, the LLM is warmed up using only natural language-based prompts. We then devise the modality fusion module based on cross-attention and self-attention mechanisms to integrate different modalities from other models into the same embedding space and incorporate them into an LLM. Extensive experiments demonstrate the effectiveness of our approach in improving recommendation accuracy. Further ablation studies validate the effectiveness of our model design and benefits of the TMF.

Via

Access Paper or Ask Questions

Thread Detection and Response Generation using Transformers with Prompt Optimisation

Mar 09, 2024

Kevin Joshua T, Arnav Agarwal, Shriya Sanjay, Yash Sarda, John Sahaya Rani Alex, Saurav Gupta, Sushant Kumar, Vishwanath Kamath

Figure 1 for Thread Detection and Response Generation using Transformers with Prompt Optimisation

Figure 2 for Thread Detection and Response Generation using Transformers with Prompt Optimisation

Figure 3 for Thread Detection and Response Generation using Transformers with Prompt Optimisation

Abstract:Conversational systems are crucial for human-computer interaction, managing complex dialogues by identifying threads and prioritising responses. This is especially vital in multi-party conversations, where precise identification of threads and strategic response prioritisation ensure efficient dialogue management. To address these challenges an end-to-end model that identifies threads and prioritises their response generation based on the importance was developed, involving a systematic decomposition of the problem into discrete components - thread detection, prioritisation, and performance optimisation which was meticulously analysed and optimised. These refined components seamlessly integrate into a unified framework, in conversational systems. Llama2 7b is used due to its high level of generalisation but the system can be updated with any open source Large Language Model(LLM). The computational capabilities of the Llama2 model was augmented by using fine tuning methods and strategic prompting techniques to optimise the model's performance, reducing computational time and increasing the accuracy of the model. The model achieves up to 10x speed improvement, while generating more coherent results compared to existing models.

* 6 pages, 4 figures, submitted to 2024 IEEE International Conference on Signal Processing and Communications (SPCOM)

Via

Access Paper or Ask Questions

LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Feb 29, 2024

Chenhao Fang, Xiaohan Li, Zezhong Fan, Jianpeng Xu, Kaushiki Nag, Evren Korpeoglu, Sushant Kumar, Kannan Achan

Figure 1 for LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Figure 2 for LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Figure 3 for LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction

Abstract:Product attribute value extraction is a pivotal component in Natural Language Processing (NLP) and the contemporary e-commerce industry. The provision of precise product attribute values is fundamental in ensuring high-quality recommendations and enhancing customer satisfaction. The recently emerging Large Language Models (LLMs) have demonstrated state-of-the-art performance in numerous attribute extraction tasks, without the need for domain-specific training data. Nevertheless, varying strengths and weaknesses are exhibited by different LLMs due to the diversity in data, architectures, and hyperparameters. This variation makes them complementary to each other, with no single LLM dominating all others. Considering the diverse strengths and weaknesses of LLMs, it becomes necessary to develop an ensemble method that leverages their complementary potentials. In this paper, we propose a novel algorithm called LLM-ensemble to ensemble different LLMs' outputs for attribute value extraction. We iteratively learn the weights for different LLMs to aggregate the labels with weights to predict the final attribute value. Not only can our proposed method be proven theoretically optimal, but it also ensures efficient computation, fast convergence, and safe deployment. We have also conducted extensive experiments with various state-of-the-art LLMs, including Llama2-13B, Llama2-70B, PaLM-2, GPT-3.5, and GPT-4, on Walmart's internal data. Our offline metrics demonstrate that the LLM-ensemble method outperforms all the state-of-the-art single LLMs on Walmart's internal dataset. This method has been launched in several production models, leading to improved Gross Merchandise Volume (GMV), Click-Through Rate (CTR), Conversion Rate (CVR), and Add-to-Cart Rate (ATC).

Via

Access Paper or Ask Questions

Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Feb 28, 2024

Shanu Vashishtha, Abhinav Prakash, Lalitesh Morishetti, Kaushiki Nag, Yokila Arora, Sushant Kumar, Kannan Achan

Figure 1 for Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Figure 2 for Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Figure 3 for Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Figure 4 for Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners

Abstract:Text-to-image models such as stable diffusion have opened a plethora of opportunities for generating art. Recent literature has surveyed the use of text-to-image models for enhancing the work of many creative artists. Many e-commerce platforms employ a manual process to generate the banners, which is time-consuming and has limitations of scalability. In this work, we demonstrate the use of text-to-image models for generating personalized web banners with dynamic content for online shoppers based on their interactions. The novelty in this approach lies in converting users' interaction data to meaningful prompts without human intervention. To this end, we utilize a large language model (LLM) to systematically extract a tuple of attributes from item meta-information. The attributes are then passed to a text-to-image model via prompt engineering to generate images for the banner. Our results show that the proposed approach can create high-quality personalized banners for users.

* 10 pages

Via

Access Paper or Ask Questions

Event-based Product Carousel Recommendation with Query-Click Graph

Feb 05, 2024

Luyi Ma, Nimesh Sinha, Parth Vajge, Jason HD Cho, Sushant Kumar, Kannan Achan

Abstract:Many current recommender systems mainly focus on the product-to-product recommendations and user-to-product recommendations even during the time of events rather than modeling the typical recommendations for the target event (e.g., festivals, seasonal activities, or social activities) without addressing the multiple aspects of the shopping demands for the target event. Product recommendations for the multiple aspects of the target event are usually generated by human curators who manually identify the aspects and select a list of aspect-related products (i.e., product carousel) for each aspect as recommendations. However, building a recommender system with machine learning is non-trivial due to the lack of both the ground truth of event-related aspects and the aspect-related products. To fill this gap, we define the novel problem as the event-based product carousel recommendations in e-commerce and propose an effective recommender system based on the query-click bipartite graph. We apply the iterative clustering algorithm over the query-click bipartite graph and infer the event-related aspects by the clusters of queries. The aspect-related recommendations are powered by the click-through rate of products regarding each aspect. We show through experiments that this approach effectively mines product carousels for the target event.

* 7 pages, 2 figures, 2021 IEEE International Conference on Big Data (Big Data)

Via

Access Paper or Ask Questions

LLMs with User-defined Prompts as Generic Data Operators for Reliable Data Processing

Dec 26, 2023

Luyi Ma, Nikhil Thakurdesai, Jiao Chen, Jianpeng Xu, Evren Korpeoglu, Sushant Kumar, Kannan Achan

Abstract:Data processing is one of the fundamental steps in machine learning pipelines to ensure data quality. Majority of the applications consider the user-defined function (UDF) design pattern for data processing in databases. Although the UDF design pattern introduces flexibility, reusability and scalability, the increasing demand on machine learning pipelines brings three new challenges to this design pattern -- not low-code, not dependency-free and not knowledge-aware. To address these challenges, we propose a new design pattern that large language models (LLMs) could work as a generic data operator (LLM-GDO) for reliable data cleansing, transformation and modeling with their human-compatible performance. In the LLM-GDO design pattern, user-defined prompts (UDPs) are used to represent the data processing logic rather than implementations with a specific programming language. LLMs can be centrally maintained so users don't have to manage the dependencies at the run-time. Fine-tuning LLMs with domain-specific data could enhance the performance on the domain-specific tasks which makes data processing knowledge-aware. We illustrate these advantages with examples in different data processing tasks. Furthermore, we summarize the challenges and opportunities introduced by LLMs to provide a complete view of this design pattern for more discussions.

* 5 pages, 8 figures, 1st IEEE International Workshop on Data Engineering and Modeling for AI (DEMAI), IEEE BigData 2023

Via

Access Paper or Ask Questions