Abstract:Question answering systems provide short, precise, and specific answers to questions. So far, many robust question answering systems have been developed for English, while some languages with fewer resources, like Persian, have few numbers of standard dataset. In this study, a comprehensive open-domain dataset is presented for Persian. This dataset is called NextQuAD and has 7,515 contexts, including 23,918 questions and answers. Then, a BERT-based question answering model has been applied to this dataset using two pre-trained language models, including ParsBERT and XLM-RoBERTa. The results of these two models have been ensembled using mean logits. Evaluation on the development set shows 0.95 Exact Match (EM) and 0.97 Fl_score. Also, to compare the NextQuAD with other Persian datasets, our trained model on the NextQuAD, is evaluated on two other datasets named PersianQA and ParSQuAD. Comparisons show that the proposed model increased EM by 0.39 and 0.14 respectively in PersianQA and ParSQuAD-manual, while a slight EM decline of 0.007 happened in ParSQuAD-automatic.
Abstract:Nowadays, with the increase in the amount of information generated in the webspace, many web service providers try to use recommender systems to personalize their services and make accessing the content convenient. Recommender systems that only try to increase the accuracy (i.e., the similarity of items to users' interest) will face the long tail problem. It means that popular items called short heads appear in the recommendation lists more than others since they have many ratings. However, unpopular items called long-tail items are used less than popular ones as they reduce accuracy. Other studies that solve the long-tail problem consider users' interests constant while their preferences change over time. We suggest that users' dynamic preferences should be taken into account to prevent the loss of accuracy when we use long-tail items in recommendation lists. This study shows that the two reasons lie in the following: 1) Users rate for different proportions of popular and unpopular items over time. 2) Users of all ages have various interests in popular and unpopular items. As a result, recommendation lists can be created over time with a different portion of long-tail and short-head items. Besides, we predict the age of users based on item ratings to use more long-tail items. The results show that by considering these two reasons, the accuracy of recommendation lists reaches 91%. At the same time, the long tail problem is better improved than other related research and provides better diversity in recommendation lists in the long run.
Abstract:These days, due to the increasing amount of information generated on the web, most web service providers try to personalize their services. Users also interact with web-based systems in multiple ways and state their interests and preferences by rating the provided items. This paper proposes a framework to predict users' demographic based on ratings registered by users in a system. To the best of our knowledge, this is the first time that the item ratings are employed for users' demographic prediction problems, which have extensively been studied in recommendation systems and service personalization. We apply the framework to the Movielens dataset's ratings and predict users' age and gender. The experimental results show that using all ratings registered by users improves the prediction accuracy by at least 16% compared with previously studied models. Moreover, by classifying the items as popular and unpopular, we eliminate ratings that belong to 95% of items and still reach an acceptable level of accuracy. This significantly reduces update costs in a time-varying environment. Besides this classification, we propose other methods to reduce data volume while keeping the predictions accurate.