Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

An Mai

Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling

Jun 11, 2025

Phuc Minh Nguyen, Ngoc-Hieu Nguyen, Duy H. M. Nguyen, Anji Liu, An Mai, Binh T. Nguyen, Daniel Sonntag, Khoa D. Doan

Abstract:Direct Alignment Algorithms (DAAs) such as Direct Preference Optimization (DPO) have emerged as alternatives to the standard Reinforcement Learning from Human Feedback (RLHF) for aligning large language models (LLMs) with human values. However, these methods are more susceptible to over-optimization, in which the model drifts away from the reference policy, leading to degraded performance as training progresses. This paper proposes a novel importance-sampling approach to mitigate the over-optimization problem of offline DAAs. This approach, called (IS-DAAs), multiplies the DAA objective with an importance ratio that accounts for the reference policy distribution. IS-DAAs additionally avoid the high variance issue associated with importance sampling by clipping the importance ratio to a maximum value. Our extensive experiments demonstrate that IS-DAAs can effectively mitigate over-optimization, especially under low regularization strength, and achieve better performance than other methods designed to address this problem. Our implementations are provided publicly at this link.

* First version

Via

Access Paper or Ask Questions

Improved sparse PCA method for face and image recognition

Dec 01, 2021

Loc Hoang Tran, Tuan Tran, An Mai

Figure 1 for Improved sparse PCA method for face and image recognition

Figure 2 for Improved sparse PCA method for face and image recognition

Figure 3 for Improved sparse PCA method for face and image recognition

Abstract:Face recognition is the very significant field in pattern recognition area. It has multiple applications in military and finance, to name a few. In this paper, the combination of the sparse PCA with the nearest-neighbor method (and with the kernel ridge regression method) will be proposed and will be applied to solve the face recognition problem. Experimental results illustrate that the accuracy of the combination of the sparse PCA method (using the proximal gradient method and the FISTA method) and one specific classification system may be lower than the accuracy of the combination of the PCA method and one specific classification system but sometimes the combination of the sparse PCA method (using the proximal gradient method or the FISTA method) and one specific classification system leads to better accuracy. Moreover, we recognize that the process computing the sparse PCA algorithm using the FISTA method is always faster than the process computing the sparse PCA algorithm using the proximal gradient method.

* 11 pages. arXiv admin note: substantial text overlap with arXiv:1904.08496

Via

Access Paper or Ask Questions

Text classification problems via BERT embedding method and graph convolutional neural network

Nov 30, 2021

Loc Hoang Tran, Tuan Tran, An Mai

Figure 1 for Text classification problems via BERT embedding method and graph convolutional neural network

Figure 2 for Text classification problems via BERT embedding method and graph convolutional neural network

Abstract:This paper presents the novel way combining the BERT embedding method and the graph convolutional neural network. This combination is employed to solve the text classification problem. Initially, we apply the BERT embedding method to the texts (in the BBC news dataset and the IMDB movie reviews dataset) in order to transform all the texts to numerical vector. Then, the graph convolutional neural network will be applied to these numerical vectors to classify these texts into their ap-propriate classes/labels. Experiments show that the performance of the graph convolutional neural network model is better than the perfor-mances of the combination of the BERT embedding method with clas-sical machine learning models.

* 12 pages

Via

Access Paper or Ask Questions

A Simplified Framework for Air Route Clustering Based on ADS-B Data

Jul 07, 2021

Quan Duong, Tan Tran, Duc-Thinh Pham, An Mai

Figure 1 for A Simplified Framework for Air Route Clustering Based on ADS-B Data

Figure 2 for A Simplified Framework for Air Route Clustering Based on ADS-B Data

Figure 3 for A Simplified Framework for Air Route Clustering Based on ADS-B Data

Figure 4 for A Simplified Framework for Air Route Clustering Based on ADS-B Data

Abstract:The volume of flight traffic gets increasing over the time, which makes the strategic traffic flow management become one of the challenging problems since it requires a lot of computational resources to model entire traffic data. On the other hand, Automatic Dependent Surveillance - Broadcast (ADS-B) technology has been considered as a promising data technology to provide both flight crews and ground control staff the necessary information safely and efficiently about the position and velocity of the airplanes in a specific area. In the attempt to tackle this problem, we presented in this paper a simplified framework that can support to detect the typical air routes between airports based on ADS-B data. Specifically, the flight traffic will be classified into major groups based on similarity measures, which helps to reduce the number of flight paths between airports. As a matter of fact, our framework can be taken into account to reduce practically the computational cost for air flow optimization and evaluate the operational performance. Finally, in order to illustrate the potential applications of our proposed framework, an experiment was performed using ADS-B traffic flight data of three different pairs of airports. The detected typical routes between each couple of airports show promising results by virtue of combining two indices for measuring the clustering performance and incorporating human judgment into the visual inspection.

* 2019 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF)

Via

Access Paper or Ask Questions

PageRank algorithm for Directed Hypergraph

Aug 29, 2019

Loc Tran, Tho Quan, An Mai

Figure 1 for PageRank algorithm for Directed Hypergraph

Figure 2 for PageRank algorithm for Directed Hypergraph

Abstract:During the last two decades, we easilly see that the World Wide Web's link structure is modeled as the directed graph. In this paper, we will model the World Wide Web's link structure as the directed hypergraph. Moreover, we will develop the PageRank algorithm for this directed hypergraph. Due to the lack of the World Wide Web directed hypergraph datasets, we will apply the PageRank algorithm to the metabolic network which is the directed hypergraph itself. The experiments show that our novel PageRank algorithm is successfully applied to this metabolic network.

* 6 pages

Via

Access Paper or Ask Questions

Solve fraud detection problem by using graph based learning methods

Aug 29, 2019

Loc Tran, Tuan Tran, Linh Tran, An Mai

Abstract:The credit cards' fraud transactions detection is the important problem in machine learning field. To detect the credit cards's fraud transactions help reduce the significant loss of the credit cards' holders and the banks. To detect the credit cards' fraud transactions, data scientists normally employ the unsupervised learning techniques and supervised learning techniques. In this paper, we employ the graph p-Laplacian based semi-supervised learning methods combined with the undersampling techniques such as Cluster Centroids to solve the credit cards' fraud transactions detection problem. Experimental results show that the graph p-Laplacian semi-supervised learning methods outperform the current state of the art graph Laplacian based semi-supervised learning method (p=2).

* 9 pages. arXiv admin note: substantial text overlap with arXiv:1811.02986

Via

Access Paper or Ask Questions