Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiyao Wang

IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property

Apr 22, 2025

Qiyao Wang, Guhong Chen, Hongbo Wang, Huaren Liu, Minghui Zhu, Zhifei Qin, Linwei Li, Yilin Yue, Shiqiang Wang, Jiayan Li(+13 more)

Abstract:Intellectual Property (IP) is a unique domain that integrates technical and legal knowledge, making it inherently complex and knowledge-intensive. As large language models (LLMs) continue to advance, they show great potential for processing IP tasks, enabling more efficient analysis, understanding, and generation of IP-related content. However, existing datasets and benchmarks either focus narrowly on patents or cover limited aspects of the IP field, lacking alignment with real-world scenarios. To bridge this gap, we introduce the first comprehensive IP task taxonomy and a large, diverse bilingual benchmark, IPBench, covering 8 IP mechanisms and 20 tasks. This benchmark is designed to evaluate LLMs in real-world intellectual property applications, encompassing both understanding and generation. We benchmark 16 LLMs, ranging from general-purpose to domain-specific models, and find that even the best-performing model achieves only 75.8% accuracy, revealing substantial room for improvement. Notably, open-source IP and law-oriented models lag behind closed-source general-purpose models. We publicly release all data and code of IPBench and will continue to update it with additional IP-related tasks to better reflect real-world challenges in the intellectual property domain.

* 89 pages, 75 figures, 55 tables

Via

Access Paper or Ask Questions

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Feb 20, 2025

M-A-P Team, Xinrun Du, Yifan Yao, Kaijing Ma, Bingli Wang, Tianyu Zheng, Kang Zhu, Minghao Liu, Yiming Liang, Xiaolong Jin(+85 more)

Abstract:Large language models (LLMs) have demonstrated remarkable proficiency in mainstream academic disciplines such as mathematics, physics, and computer science. However, human knowledge encompasses over 200 specialized disciplines, far exceeding the scope of existing benchmarks. The capabilities of LLMs in many of these specialized fields-particularly in light industry, agriculture, and service-oriented disciplines-remain inadequately evaluated. To address this gap, we present SuperGPQA, a comprehensive benchmark that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines. Our benchmark employs a novel Human-LLM collaborative filtering mechanism to eliminate trivial or ambiguous questions through iterative refinement based on both LLM responses and expert feedback. Our experimental results reveal significant room for improvement in the performance of current state-of-the-art LLMs across diverse knowledge domains (e.g., the reasoning-focused model DeepSeek-R1 achieved the highest accuracy of 61.82% on SuperGPQA), highlighting the considerable gap between current model capabilities and artificial general intelligence. Additionally, we present comprehensive insights from our management of a large-scale annotation process, involving over 80 expert annotators and an interactive Human-LLM collaborative system, offering valuable methodological guidance for future research initiatives of comparable scope.

Via

Access Paper or Ask Questions

AutoPatent: A Multi-Agent Framework for Automatic Patent Generation

Dec 13, 2024

Qiyao Wang, Shiwen Ni, Huaren Liu, Shule Lu, Guhong Chen, Xi Feng, Chi Wei, Qiang Qu, Hamid Alinejad-Rokny, Yuan Lin(+1 more)

Abstract:As the capabilities of Large Language Models (LLMs) continue to advance, the field of patent processing has garnered increased attention within the natural language processing community. However, the majority of research has been concentrated on classification tasks, such as patent categorization and examination, or on short text generation tasks like patent summarization and patent quizzes. In this paper, we introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges LLMs to generate full-length patents averaging 17K tokens based on initial drafts. Patents present a significant challenge to LLMs due to their specialized nature, standardized terminology, and extensive length. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents. The experimental results demonstrate that our AutoPatent framework significantly enhances the ability to generate comprehensive patents across various LLMs. Furthermore, we have discovered that patents generated solely with the AutoPatent framework based on the Qwen2.5-7B model outperform those produced by larger and more powerful LLMs, such as GPT-4o, Qwen2.5-72B, and LLAMA3.1-70B, in both objective metrics and human evaluations. We will make the data and code available upon acceptance at \url{https://github.com/QiYao-Wang/AutoPatent}.

* 19 pages, 7 figures

Via

Access Paper or Ask Questions

IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Jun 18, 2024

Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

Figure 1 for IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Figure 2 for IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Figure 3 for IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Figure 4 for IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

Abstract:The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four major dimensions: creation, application, protection, and management of IP. These questions span patent rights (inventions, utility models, designs), trademarks, copyrights, trade secrets, and other related laws. Evaluation methods include zero-shot, 5-few-shot, and Chain of Thought (CoT) for seven LLM types, predominantly in English or Chinese. Results show superior English performance by models like GPT series and Qwen series, while Chinese-centric LLMs excel in Chinese tests, albeit specialized IP LLMs lag behind general-purpose ones. Regional and temporal aspects of IP underscore the need for LLMs to grasp legal nuances and evolving laws. IPEval aims to accurately gauge LLM capabilities in IP and spur development of specialized models. Website: \url{https://ipeval.github.io/}

Via

Access Paper or Ask Questions

1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

Apr 08, 2021

Qiyao Wang, Pengfei Li, Li Zhu, Yi Niu

Figure 1 for 1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

Figure 2 for 1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

Figure 3 for 1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

Figure 4 for 1st Place Solution to ICDAR 2021 RRC-ICTEXT End-to-end Text Spotting and Aesthetic Assessment on Integrated Circuit

Abstract:This paper presents our proposed methods to ICDAR 2021 Robust Reading Challenge - Integrated Circuit Text Spotting and Aesthetic Assessment (ICDAR RRC-ICTEXT 2021). For the text spotting task, we detect the characters on integrated circuit and classify them based on yolov5 detection model. We balance the lowercase and non-lowercase by using SynthText, generated data and data sampler. We adopt semi-supervised algorithm and distillation to furtherly improve the model's accuracy. For the aesthetic assessment task, we add a classification branch of 3 classes to differentiate the aesthetic classes of each character. Finally, we make model deployment to accelerate inference speed and reduce memory consumption based on NVIDIA Tensorrt. Our methods achieve 59.1 mAP on task 3.1 with 31 FPS and 306M memory (rank 1), 78.7\% F2 score on task 3.2 with 30 FPS and 306M memory (rank 1).

* 5 pages

Via

Access Paper or Ask Questions

Deep Time Series Models for Scarce Data

Mar 16, 2021

Qiyao Wang, Ahmed Farahat, Chetan Gupta, Shuai Zheng

Figure 1 for Deep Time Series Models for Scarce Data

Figure 2 for Deep Time Series Models for Scarce Data

Figure 3 for Deep Time Series Models for Scarce Data

Figure 4 for Deep Time Series Models for Scarce Data

Abstract:Time series data have grown at an explosive rate in numerous domains and have stimulated a surge of time series modeling research. A comprehensive comparison of different time series models, for a considered data analytics task, provides useful guidance on model selection for data analytics practitioners. Data scarcity is a universal issue that occurs in a vast range of data analytics problems, due to the high costs associated with collecting, generating, and labeling data as well as some data quality issues such as missing data. In this paper, we focus on the temporal classification/regression problem that attempts to build a mathematical mapping from multivariate time series inputs to a discrete class label or a real-valued response variable. For this specific problem, we identify two types of scarce data: scarce data with small samples and scarce data with sparsely and irregularly observed time series covariates. Observing that all existing works are incapable of utilizing the sparse time series inputs for proper modeling building, we propose a model called sparse functional multilayer perceptron (SFMLP) for handling the sparsity in the time series covariates. The effectiveness of the proposed SFMLP under each of the two types of data scarcity, in comparison with the conventional deep sequential learning models (e.g., Recurrent Neural Network, and Long Short-Term Memory), is investigated through mathematical arguments and numerical experiments.

* Journal paper to appear in Neurocomputing

Via

Access Paper or Ask Questions

A Non-linear Function-on-Function Model for Regression with Time Series Data

Nov 24, 2020

Qiyao Wang, Haiyan Wang, Chetan Gupta, Aniruddha Rajendra Rao, Hamed Khorasgani

Figure 1 for A Non-linear Function-on-Function Model for Regression with Time Series Data

Figure 2 for A Non-linear Function-on-Function Model for Regression with Time Series Data

Figure 3 for A Non-linear Function-on-Function Model for Regression with Time Series Data

Figure 4 for A Non-linear Function-on-Function Model for Regression with Time Series Data

Abstract:In the last few decades, building regression models for non-scalar variables, including time series, text, image, and video, has attracted increasing interests of researchers from the data analytic community. In this paper, we focus on a multivariate time series regression problem. Specifically, we aim to learn mathematical mappings from multiple chronologically measured numerical variables within a certain time interval S to multiple numerical variables of interest over time interval T. Prior arts, including the multivariate regression model, the Seq2Seq model, and the functional linear models, suffer from several limitations. The first two types of models can only handle regularly observed time series. Besides, the conventional multivariate regression models tend to be biased and inefficient, as they are incapable of encoding the temporal dependencies among observations from the same time series. The sequential learning models explicitly use the same set of parameters along time, which has negative impacts on accuracy. The function-on-function linear model in functional data analysis (a branch of statistics) is insufficient to capture complex correlations among the considered time series and suffer from underfitting easily. In this paper, we propose a general functional mapping that embraces the function-on-function linear model as a special case. We then propose a non-linear function-on-function model using the fully connected neural network to learn the mapping from data, which addresses the aforementioned concerns in the existing approaches. For the proposed model, we describe in detail the corresponding numerical implementation procedures. The effectiveness of the proposed model is demonstrated through the application to two real-world problems.

* Accepted by IEEE Big Data 2020

Via

Access Paper or Ask Questions

Spatio-Temporal Functional Neural Networks

Sep 11, 2020

Aniruddha Rajendra Rao, Qiyao Wang, Haiyan Wang, Hamed Khorasgani, Chetan Gupta

Figure 1 for Spatio-Temporal Functional Neural Networks

Figure 2 for Spatio-Temporal Functional Neural Networks

Figure 3 for Spatio-Temporal Functional Neural Networks

Figure 4 for Spatio-Temporal Functional Neural Networks

Abstract:Explosive growth in spatio-temporal data and its wide range of applications have attracted increasing interests of researchers in the statistical and machine learning fields. The spatio-temporal regression problem is of paramount importance from both the methodology development and real-world application perspectives. Given the observed spatially encoded time series covariates and real-valued response data samples, the goal of spatio-temporal regression is to leverage the temporal and spatial dependencies to build a mapping from covariates to response with minimized prediction error. Prior arts, including the convolutional Long Short-Term Memory (CovLSTM) and variations of the functional linear models, cannot learn the spatio-temporal information in a simple and efficient format for proper model building. In this work, we propose two novel extensions of the Functional Neural Network (FNN), a temporal regression model whose effectiveness and superior performance over alternative sequential models have been proven by many researchers. The effectiveness of the proposed spatio-temporal FNNs in handling varying spatial correlations is demonstrated in comprehensive simulation studies. The proposed models are then deployed to solve a practical and challenging precipitation prediction problem in the meteorology field.

* Accepted by 2020 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

Via

Access Paper or Ask Questions

Health Indicator Forecasting for Improving Remaining Useful Life Estimation

Jun 05, 2020

Qiyao Wang, Ahmed Farahat, Chetan Gupta, Haiyan Wang

Figure 1 for Health Indicator Forecasting for Improving Remaining Useful Life Estimation

Figure 2 for Health Indicator Forecasting for Improving Remaining Useful Life Estimation

Figure 3 for Health Indicator Forecasting for Improving Remaining Useful Life Estimation

Figure 4 for Health Indicator Forecasting for Improving Remaining Useful Life Estimation

Abstract:Prognostics is concerned with predicting the future health of the equipment and any potential failures. With the advances in the Internet of Things (IoT), data-driven approaches for prognostics that leverage the power of machine learning models are gaining popularity. One of the most important categories of data-driven approaches relies on a predefined or learned health indicator to characterize the equipment condition up to the present time and make inference on how it is likely to evolve in the future. In these approaches, health indicator forecasting that constructs the health indicator curve over the lifespan using partially observed measurements (i.e., health indicator values within an initial period) plays a key role. Existing health indicator forecasting algorithms, such as the functional Empirical Bayesian approach, the regression-based formulation, a naive scenario matching based on the nearest neighbor, have certain limitations. In this paper, we propose a new `generative + scenario matching' algorithm for health indicator forecasting. The key idea behind the proposed approach is to first non-parametrically fit the underlying health indicator curve with a continuous Gaussian Process using a sample of run-to-failure health indicator curves. The proposed approach then generates a rich set of random curves from the learned distribution, attempting to obtain all possible variations of the target health condition evolution process over the system's lifespan. The health indicator extrapolation for a piece of functioning equipment is inferred as the generated curve that has the highest matching level within the observed period. Our experimental results show the superiority of our algorithm over the other state-of-the-art methods.

* Accepted by IEEE International Conference on Prognostics and Health Management 2020

Via

Access Paper or Ask Questions

Regularized Operating Envelope with Interpretability and Implementability Constraints

Dec 21, 2019

Qiyao Wang, Haiyan Wang, Chetan Gupta, Susumu Serita

Figure 1 for Regularized Operating Envelope with Interpretability and Implementability Constraints

Figure 2 for Regularized Operating Envelope with Interpretability and Implementability Constraints

Figure 3 for Regularized Operating Envelope with Interpretability and Implementability Constraints

Figure 4 for Regularized Operating Envelope with Interpretability and Implementability Constraints

Abstract:Operating envelope is an important concept in industrial operations. Accurate identification for operating envelope can be extremely beneficial to stakeholders as it provides a set of operational parameters that optimizes some key performance indicators (KPI) such as product quality, operational safety, equipment efficiency, environmental impact, etc. Given the importance, data-driven approaches for computing the operating envelope are gaining popularity. These approaches typically use classifiers such as support vector machines, to set the operating envelope by learning the boundary in the operational parameter spaces between the manually assigned `large KPI' and `small KPI' groups. One challenge to these approaches is that the assignment to these groups is often ad-hoc and hence arbitrary. However, a bigger challenge with these approaches is that they don't take into account two key features that are needed to operationalize operating envelopes: (i) interpretability of the envelope by the operator and (ii) implementability of the envelope from a practical standpoint. In this work, we propose a new definition for operating envelope which directly targets the expected magnitude of KPI (i.e., no need to arbitrarily bin the data instances into groups) and accounts for the interpretability and the implementability. We then propose a regularized `GA + penalty' algorithm that outputs an envelope where the user can tradeoff between bias and variance. The validity of our proposed algorithm is demonstrated by two sets of simulation studies and an application to a real-world challenge in the mining processes of a flotation plant.

* In the proceedings of the 2019 IEEE Big Data Conference

Via

Access Paper or Ask Questions