Abstract:The rise of Large Language Models (LLMs) has brought about concerns regarding copyright infringement and unethical practices in data and model usage. For instance, slight modifications to existing LLMs may be used to falsely claim the development of new models, leading to issues of model copying and violations of ownership rights. This paper addresses these challenges by introducing a novel metric for quantifying LLM similarity, which leverages perplexity curves and differences in Menger curvature. Comprehensive experiments validate the performance of our methodology, demonstrating its superiority over baseline methods and its ability to generalize across diverse models and domains. Furthermore, we highlight the capability of our approach in detecting model replication through simulations, emphasizing its potential to preserve the originality and integrity of LLMs. Code is available at https://github.com/zyttt-coder/LLM_similarity.