Abstract:Recent advances in low-rank adaptation (LoRA) have enabled efficient fine-tuning of large language models (LLMs) with minimal additional parameters. However, existing LoRA methods apply static rank configurations uniformly across all input tokens, ignoring variation in token complexity and computational requirements. In this work, we propose ChunkWise LoRA, a dynamic and adaptive approach that partitions sequences into variable-length chunks based on token complexity and assigns each chunk a tailored low-rank configuration. Our system introduces a runtime scheduler that estimates token difficulty, performs adaptive chunking, and selects per-chunk LoRA rank and scaling using a rank-ladder mechanism. To preserve output consistency, we further introduce a boundary-safe composition module and integrate policy-driven KV-cache strategies. Experiments on benchmark datasets such as Wikitext-103 and SQuAD demonstrate that ChunkWise LoRA achieves up to 34\% lower latency and 38% memory reduction compared to baseline LoRA, while maintaining or improving task performance metrics like BLEU, EM, and perplexity. The proposed framework remains fully compatible with existing transformer architectures and inference frameworks, providing a practical solution for real-world deployment of parameter-efficient LLMs.




Abstract:Category recommendation for users on an e-Commerce platform is an important task as it dictates the flow of traffic through the website. It is therefore important to surface precise and diverse category recommendations to aid the users' journey through the platform and to help them discover new groups of items. An often understated part in category recommendation is users' proclivity to repeat purchases. The structure of this temporal behavior can be harvested for better category recommendations and in this work, we attempt to harness this through variational inference. Further, to enhance the variational inference based optimization, we initialize the optimizer at better starting points through the well known Metapath2Vec algorithm. We demonstrate our results on two real-world datasets and show that our model outperforms standard baseline methods.



Abstract:Recommender Systems have become an integral part of online e-Commerce platforms, driving customer engagement and revenue. Most popular recommender systems attempt to learn from users' past engagement data to understand behavioral traits of users and use that to predict future behavior. In this work, we present an approach to use causal inference to learn user-attribute affinities through temporal contexts. We formulate this objective as a Probabilistic Machine Learning problem and apply a variational inference based method to estimate the model parameters. We demonstrate the performance of the proposed method on the next attribute prediction task on two real world datasets and show that it outperforms standard baseline methods.